-.- --. .-. --..

HTTP Request Response Caching Using Faraday: Part 1

27 Dec 2014

Fetching data from an external resource is a common use-case in many applications. This post was a result of such a use-case I stumbled upon while debugging really slow requests per second for the Sinatra Recipes website. It is a Sinatra application that, on the home page, fetches the contributors list by connecting to GitHub. And it does this every time the page is requested. To avoid API rate limits—60 requests per hour, in the case of non-authenticated requests to GitHub—and to improve the performance, those requests can be cached. This series of posts will explain ways to cache HTTP requests made using the Faraday library.

Faraday’s design provides a way to hook middleware into the request-response cycle. There are libraries build specifically for caching HTTP requests that are mountable as middleware, for example, faraday_middleware and faraday-http-cache. Although the name of the former can be confusing, it is a set of middleware libraries maintained as part of the main library itself—one of those middlewares being a caching API.

In part 1 of the series, we’ll be looking at faraday_middleware. The program we’ll be using fetches the stargazer count for sinatra/sinatra repo using GitHub’s API. Here’s how the basic program looks:

# complete_without_caching_example.rb

CONNECTION = Faraday.new(url: 'https://api.github.com/') do |faraday|
  faraday.request  :url_encoded
  faraday.adapter  Faraday.default_adapter
end

load File.expand_path('../', __FILE__) + '/fetcher.rb'

# fetcher.rb

FETCHER = lambda do |counts, times|
  start_time = Time.now
  # GitHub's <span class='smallcaps'>API</span> returns a <span
  class='smallcaps'>JSON</span> response
  response = CONNECTION.get('/repos/sinatra/sinatra/stargazers')
  end_time = Time.now

  response_hash = JSON.load(response.body)
  counts << response_hash.count
  times << (end_time - start_time).round(2)
end

counts = []
times  = []

10.times do
  FETCHER.call(counts, times)
end

puts "Number of stargazers:    #{counts.uniq.first}"
puts "Number of requests made: #{counts.count}"
puts "Response times: #{times}"

This was the output on my machine (last three lines):

❯ ruby complete_without_caching_example.rb
(…after a lot of response logging output…)

Number of stargazers:    30
Number of requests made: 10
Response times: [1.3, 1.03, 1.02, 1.01, 1.15, 0.99, 1.18, 1.1, 1.11, 1.03]

The logger logs all responses that were returned from GitHub. You’ll see that there will be 10 responses that get logged to STDOUT without any caching.

FaradayMiddleware

FaradayMiddleware provides a caching plugin with which one can achieve rudimentary caching. I should point out that the way a cache store is managed depends on the store’s library and not on the middleware library. This is true for faraday-http-cache as well. What this means is that selecting a cache store (ala Memcached, database, file-system or in-memory) also requires one to figure out which library to use (Memcached, Dalli) and if that library is also compatible with the middleware one is using. As far as I was able to figure out, the Memcached library can’t be used with FaradayMiddleware but Dalli works well.

To check if a cache store library 1 can be used with FaradayMiddleware, the pre-requisite condition is that the library should, at a minimum, implement write, read and fetch methods. read and write take a key as an argument and return[/write] a value from[/to] the store. The fetch method is a hybrid of the two, which takes-in a key and a block and executes the block if the provided key has no corresponding value in the store. Here are the steps it should execute:

  1. Check if a value for the given key is present in the store.
  2. If the value for a key is not present in the store, evaluate the block and save the result as the value for that key.
  3. If the value for the key is present, then return the value without any update.
store.fetch('existent_key')
#=> returns the value for the key `existent_key`

store.fetch('non_existent_key') { 100 }
#=> returns 100 and sets the value for the key `non_existent_key` as
#=> 100.

That said, the latest version of the middleware doesn’t seem to use fetch at all.

This means that we can roll our own caching store API by implementing these 3 methods, and hook it onto Faraday’s request-reponse cycle. And that’s what we’re going to do for the rest of the article. Our custom cache store library and a Ruby hash as the data store. The three requisite methods can be implemented in this way:

# my_cache_store.rb

class MyCacheStore
  attr_accessor :store_hash

  def initialize
    @store_hash = {}
  end

  def write(key, value)
    @store_hash[key.to_sym] = value
  end

  def read(key)
    @store_hash[key.to_sym]
  end

  def fetch(key)
    raise('Block should be provided') unless block_given?
    result = read(key)

    if result.nil?
      result = yield
      write(key, result)
      result
    else
      result
    end
  end
end

Now, all we have to do is provide an instance of MyCacheStore to FaradayMiddleware::Caching plugin which will be mounted as a middleware on the Faraday connection object. The updated code looks like this:

store  = MyCacheStore.new

CONNECTION = Faraday.new(url: 'https://api.github.com/') do |faraday|
  faraday.use FaradayMiddleware::Caching, store
  faraday.request  :url_encoded
  faraday.adapter  Faraday.default_adapter
  faraday.response :logger
end

load File.expand_path('../', __FILE__) + '/fetcher.rb'

Here’s the output on my machine when this code gets run:

❯ ruby complete_custom_store_example.rb

Number of stargazers:    30
Number of requests made: 10
Response times: [1.03, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Notice that only during the first request went to GitHub’s servers and took a while to respond, and that subsequent requests were almost instantaneous. The response logger logs only 1 request this time.

This the least minimum cache store that will work. Well, almost.

What our custom store library implementation is missing is a way to invalidate the cache. In the example, till the time the store_hash is available, the response will always be the same.

If the 10.times {...} call in fetcher.rb is changed to the following, the program will run for the next 10 days without stopping, making one request every day and returning pretty much the same result as above after 10 days.

10.times do
  FETCHER.call(counts, times)
  sleep 86400
end

What if, in these 10 days, more people starred the repository? Since we haven’t told our cache when the caching needs to be bypassed, we will get the same result. The only way to force the request to go through, in this case, is to restart the program. Any useful caching mechanism has some way to invalidate a key. We’ll implement a simple way in the next section.

Cache invalidation

In FaradayMiddleware, the cache key—key for the hash in our store library—is the URI for the remote resource including the query params 2.

In essence, the key is never expired by the middleware. This has to be built into the store library instead. For this post, we will settle for a invalidation mechanism based on time. In the next part, we’ll implement a more complex caching mechanism that utilizes HTTP headers. We’ll pass in an option while initializing the cache store to tell it how long (in seconds) to keep the cached value. The resultant MyCacheStore implementation would look like this:

# my_store_with_time_expiry.rb

class MyCacheStoreWithTimeExpiry < MyCacheStore
  attr_reader :options

  def initialize(options = {})
    super()
    @refresh_in = @options.fetch(:refresh_in) { Float::INFINITY }
  end

  def write(key, value)
    @store_hash[key.to_sym] = { value: value, created_at: Time.now }
  end

  def read(key)
    value_hash = @store_hash[key.to_sym]
    return nil unless value_hash
    if Time.now - value_hash[:created_at] > @refresh_in
      nil
    else
      value_hash[:value]
    end
  end
end

Our store_hash hash store now stores the time of creation of key-value pair and when read-ing the value for the key, we check if the time elapsed since the storage up till the point the key is accessed is less than the :refresh_in value set. Also notice that we are setting the :refresh_in value to Infinity if the option is not set. This would mean that the cache never expires. These are the results after I ran the program with a specific expiration time and a default expiration time:

store = MyCacheStoreWithTimeExpiry.new

❯ ruby complete_custom_store_time_expiry_example.rb
Number of stargazers:    30
Number of requests made: 10
Response times: [1.51, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

There will be only one call to the GitHub API, similar to the case with MyCacheStore. Here’s the link to the code.

store = MyCacheStoreWithTimeExpiry.new(refresh_in: 0.0001)

❯ ruby complete_custom_store_time_expiry_example.rb
Number of stargazers:    30
Number of requests made: 10
Response times: [1.2, 1.1, 1.01, 2.48, 0.98, 0.99, 1.08, 1.23, 1.43, 1.13]

This time, there will be 10 calls to GitHub logged to STDOUT.

In the first run, the store is initialized with no options and the result is similar to what we had earlier in the basic cache implementation example. In the second run, the store has been initialized with the :refresh_in value set to a really low value of 0.0001 (seconds) and the result is similar to the one where there was no caching whatsoever.

This way of cache invalidation is not a bullet-proof strategy. Some examples where this scheme would work fine is, say, when you want to fetch a particular tweet; Twitter never expires the tweet url, which just contains the tweet’s ID. But this would be a bad idea if we were to fetch search results for a hashtag, since the search results potentially keep changing all the time. Even with the GitHub example that has been the subject of this post, having a pre-set expiry time for a cache key is not the most efficient solution. We’ll implement a better one by iterating on this in the next post. Stay tuned.


Footnotes

1^ Some of the stores that are compatible with FaradayMiddleware are:

The ActiveSupport links above are for the source code of each of those. The documentation on how to use these stores is provided in the Rails Guides.

2^ If we want the middleware library to ignore certain params of the URI while genering the cache key, we can pass in those params as an option while declaring the middleware.

request_uri = 'https://example.com/?id=10&name=kashyap'

faraday.use FaradayMiddleware::Caching, store
#=> cache key is equal to request_uri

faraday.use FaradayMiddleware::Caching, store, ignore_params: ['name']
#=> cache key is equal to 'https://example.com/?id=10'