Fetching data from an external resource is a common use-case in many applications. This post was a result of such a use-case I stumbled upon while debugging really slow requests per second for the Sinatra Recipes website. It is a Sinatra application that, on the home page, fetches the contributors list by connecting to GitHub. And it does this every time the page is requested. To avoid API rate limits—60 requests per hour, in the case of non-authenticated requests to GitHub—and to improve the performance, those requests can be cached. This series of posts will explain ways to cache HTTP requests made using the Faraday library.
Faraday’s design provides a way to hook middleware into the request-response cycle. There are libraries build specifically for caching HTTP requests that are mountable as middleware, for example, faraday_middleware
and faraday-http-cache
. Although the name of the former can be confusing, it is a set of middleware libraries maintained as part of the main library itself—one of those middlewares being a caching API.
In part 1 of the series, we’ll be looking at faraday_middleware
. The program we’ll be using fetches the stargazer count for sinatra/sinatra
repo using GitHub’s API. Here’s how the basic program looks:
This was the output on my machine (last three lines):
FaradayMiddleware
FaradayMiddleware
provides a caching plugin with which one can achieve rudimentary caching. I should point out that the way a cache store is managed depends on the store’s library and not on the middleware library. This is true for faraday-http-cache
as well. What this means is that selecting a cache store (ala Memcached, database, file-system or in-memory) also requires one to figure out which library to use (Memcached, Dalli) and if that library is also compatible with the middleware one is using. As far as I was able to figure out, the Memcached
library can’t be used with FaradayMiddleware
but Dalli
works well.
To check if a cache store library1 can be used with FaradayMiddleware
, the pre-requisite condition is that the library should, at a minimum, implement write
, read
and fetch
methods. read
and write
take a key as an argument and return[/write] a value from[/to] the store. The fetch
method is a hybrid of the two, which takes-in a key and a block and executes the block if the provided key has no corresponding value in the store. Here are the steps it should execute:
- Check if a value for the given key is present in the store.
- If the value for a key is not present in the store, evaluate the block and save the result as the value for that key.
- If the value for the key is present, then return the value without any update.
This means that we can roll our own caching store API by implementing these 3 methods, and hook it onto Faraday’s request-reponse cycle. And that’s what we’re going to do for the rest of the article. Our custom cache store library and a Ruby hash as the data store. The three requisite methods can be implemented in this way:
Now, all we have to do is provide an instance of MyCacheStore
to FaradayMiddleware::Caching
plugin which will be mounted as a middleware on the Faraday connection object. The updated code looks like this:
Here’s the output on my machine when this code gets run:
This the least minimum cache store that will work. Well, almost.
What our custom store library implementation is missing is a way to invalidate the cache. In the example, till the time the store_hash
is available, the response will always be the same.
If the 10.times {...}
call in fetcher.rb
is changed to the following, the program will run for the next 10 days without stopping, making one request every day and returning pretty much the same result as above after 10 days.
What if, in these 10 days, more people starred the repository? Since we haven’t told our cache when the caching needs to be bypassed, we will get the same result. The only way to force the request to go through, in this case, is to restart the program. Any useful caching mechanism has some way to invalidate a key. We’ll implement a simple way in the next section.
Cache invalidation
In FaradayMiddleware
, the cache key—key for the hash in our store library—is the URI for the remote resource including the query params 2.
In essence, the key is never expired by the middleware. This has to be built into the store library instead. For this post, we will settle for a invalidation mechanism based on time. In the next part, we’ll implement a more complex caching mechanism that utilizes HTTP headers. We’ll pass in an option while initializing the cache store to tell it how long (in seconds) to keep the cached value. The resultant MyCacheStore
implementation would look like this:
Our store_hash
hash store now stores the time of creation of key-value pair and when read
-ing the value for the key, we check if the time elapsed since the storage up till the point the key is accessed is less than the :refresh_in
value set. Also notice that we are setting the :refresh_in
value to Infinity
if the option is not set. This would mean that the cache never expires. These are the results after I ran the program with a specific expiration time and a default expiration time:
In the first run, the store is initialized with no options and the result is similar to what we had earlier in the basic cache implementation example. In the second run, the store has been initialized with the :refresh_in
value set to a really low value of 0.0001
(seconds) and the result is similar to the one where there was no caching whatsoever.
This way of cache invalidation is not a bullet-proof strategy. Some examples where this scheme would work fine is, say, when you want to fetch a particular tweet; Twitter never expires the tweet url, which just contains the tweet’s ID. But this would be a bad idea if we were to fetch search results for a hashtag, since the search results potentially keep changing all the time. Even with the GitHub example that has been the subject of this post, having a pre-set expiry time for a cache key is not the most efficient solution. We’ll implement a better one by iterating on this in the next post. Stay tuned.
Footnotes
-
Some of the stores that are compatible with
FaradayMiddleware
are:- Dalli
- ActiveSupport::Cache::FileStore
- ActiveSupport::Cache::MemCacheStore
- ActiveSupport::Cache::MemoryStore
The
ActiveSupport
links above are for the source code of each of those. The documentation on how to use these stores is provided in the Rails Guides. ↩ -
If we want the middleware library to ignore certain params of the URI while genering the cache key, we can pass in those params as an option while declaring the middleware. ↩