Efficient Counter Caches

Counter caches are great until you get 20,000 people following the same influencer and the counter cache on that person locks the whole database. The goal of this post is to go through the basics of counter caches explaining possible alternatives to overcome this.

What are counter caches for?
Counter caches were created to cache the counts of associated records.
Image we are developing a social media application where we have influencers and fans:

class Influencer < ApplicationRecord
  has_many :fan_influencers
  has_many :fans, through: :fan_influencers
end

class Fan < ApplicationRecord
  has_many :fan_influencers
end
class FanInfluencer < ApplicationRecord
  belongs_to :fan
  belongs_to :influencer
end

Everytime we display an influencer in the list we want to show how many followers he has.

The naive approach to achieve this is doing a count of the associated fan records for each influencer.

some_influencer.fans.count

While this might be enough if we are rendering only one influencer, it definitely does not scale well if we are going to render a list of many influencers, since it’s going to trigger a select count query for each influencer.

json.influencers @influencers do |influencer|
    json.name influencer.name
    json.fans_count influencer.fans.count
end

In the logs we get:

Influencer Load (1.1ms) SELECT "influencers".* FROM "influencers"
(0.9ms) SELECT COUNT(*) FROM "fans" INNER JOIN "fan_influencers" ON "fans"."id" = "fan_influencers"."fan_id" WHERE "fan_influencers"."influencer_id" = $1 [["influencer_id", 1]]
(0.7ms) SELECT COUNT(*) FROM "fans" INNER JOIN "fan_influencers" ON "fans"."id" = "fan_influencers"."fan_id" WHERE "fan_influencers"."influencer_id" = $1 [["influencer_id", 2]]
(1.0ms) SELECT COUNT(*) FROM "fans" INNER JOIN "fan_influencers" ON "fans"."id" = "fan_influencers"."fan_id" WHERE "fan_influencers"."influencer_id" = $1 [["influencer_id", 3]]
(1.3ms) SELECT COUNT(*) FROM "fans" INNER JOIN "fan_influencers" ON "fans"."id" = "fan_influencers"."fan_id" WHERE "fan_influencers"."influencer_id" = $1 [["influencer_id", 4]]

… and goes on for each influencer…

That’s when counter caches gain importance. Counter caches are a built in Rails feature that allows you to track the associated records count by doing almost nothing. It only requires you to add a new integer column in the influencers table and let the associated model know that it has to update that column every time a records gets created or destroyed (or is no longer referencing the influencer).

To use Rails counter caches you need to create a migration to add a new integer column to the associated model:

add_column :fan_influencers, :fans_count, :integer, default: 0, null: false

Let the FanInfluencer model know it has to update the counter cache in the influencer model:

class FanInfluencer < ApplicationRecord
  belongs_to :fan
  belongs_to :influencer, counter_cache: :fans_count
end

Reset the counters for existing records with:

    Influencer.find_each { |influencer| Influencer.reset_counters(influencer.id, :fans_count) }

And then you are ready to use the new column like you would with any other attribute:

json.influencers Influencer.all do |influencer|
  json.name influencer.name
  json.fans_count influencer.fans_count
end

Using the new counter cache the db will execute only one query that will select the influencers to be rendered, since fans_count is now just a column on the influencer and no counts or joins with other tables are needed.

Influencer Load (1.4ms) SELECT "influencers".* FROM "influencers"

(Should you need more complex counter caches, consider using Counter culture gem that works around some deficiencies in Rails original counter caches.)

So… what’s the problem?

Counter Caches have a fairly simple implementation using callbacks to keep this column updated. Every time a fan influencer record is created, updated, or destroyed, the counter cache on the corresponding influencer needs to get updated. This might make these type of operations slower. If that’s a cost you can take, go ahead. But consider the following situation:

Imagine a very famous influencer has been advertised and suddenly all of our users click on ‘Follow’. Every ‘Follow’ action on that influencer creates a fanInfluencer record, which will end in the db trying to execute many update queries to the counter cache of the influencer around the same time, locking up that db row and making our users requests time out. Ugh…

First solution. Use an SQL alternative to dynamically calculate the count without falling into the n+1 problem raised before introducing the counter caches.

We need to add the following scope on the influencer model:

  scope :with_fans_counter, -> {
    select <<~SQL
      influencers.*,
      (
        SELECT COUNT(fan_influencers.id) FROM fan_influencers
        WHERE influencer_id = influencers.id
      ) AS fans_count
    SQL
  }

When we load the influencers to render we need to use the new scope:

Influencer.with_fans_counter

And then we just call fans_count for a particular influencer to get the counter.

In the logs we get:

Influencer Load (2.1ms) SELECT influencers.*,
(
SELECT COUNT(fan_influencers.id) FROM fan_influencers
WHERE influencer_id = influencers.id
) AS fans_count
FROM "influencers"

This solution is slower than having the counter cache when rendering the influencer list since the latter one only has to select attributes of the same table. However, using this method you don’t need to keep any counter cache column updated, making create/destroy/update actions on the fan_influencer table simpler since they don’t trigger any callback afterwards. The db lock problem will go away.

So, this SQL alternative could be appropriate for you if:

  • You have many create/destroy/update actions on the fan_influencer table in comparison to requesting the influencers fans count.
  • Your app is prone to facing situations like the mentioned above, where the counter cache of a single record could get updated more times in a small period of time than it can handle.

Comparing the three approaches
We measured the time it took to request all the influencers each with his fans count using the three different approaches. In the db there were only 10 influencers and 1,000 fans, all the fans following all the influencers. The times are calculated as the average Active Record time of 5 runs.

Method #1: Using naive approach (fans.count)
18.6ms
Method #2: Using counter cache column
0.5ms
Method #3: Using SQL dynamic count
3.7ms

Second solution. Store the changes of the counter in a buffer in a faster storage first (like redis or Memcached) to later update the real value in the database.
After a fan_influencer record has been created or destroyed, the value in the redis cache for the influencer key is increased or decreased by one. After a couple of seconds, the count value for the redis key is added (or subtracted) to the counter cache in the db, and reset to 0.
A good idea is to also use a recalculation job once in a while, to make sure the counters are updated with the real count value, compensating for transient errors that may cause the deferred counters to drift from the actual value.

There are many gems that help you achieve this if you don’t want to implement it yourself.
In this post it was used Wanelo gem to enqueue the jobs for me, but since it hasn’t been committed to since Oct 7, 2016 you might want to implement that part yourself.

Next it’s explained how to achieve this using that gem and assuming you have redis and a background worker installed. In this case it was used Delayed Jobs as the worker but you will see that it is very simple to modify the solution to use another one.

Start by configuring the gem with:

# config/initializers/counter_cache.rb
require 'cc_worker'
require 'redis'

Counter::Cache.configure do |c|
  c.default_worker_adapter = CCWorker
  c.recalculation_delay = (ENV['CC_RECALCULATION_DELAY_HOURS'] || '5').to_f.hours
  c.redis_pool = $redis
  c.counting_data_store = Counter::Cache::Redis.new
end

Let the fanInfluencer model know how to use the counter cache:

class FanInfluencer < ActiveRecord::Base
  include Counter::Cache
  belongs_to :fan
  belongs_to :influencer
  counter_cache_on column: :fans_count,
                           relation: :influencer,
                           relation_class_name: 'Influencer',
                           wait: (ENV['CC_DEFERRED_DELAY_SECONDS'] || '20').to_i.seconds,
                           recalculation: true,
                           method: :update_fans_count
end

Create a method in the influencer to update the count to the real value

#influencer.rb
def update_fans_count
  update!(fans_count: fans.count)
  fans_count
end

And here is where the fun part begins.
The gem adds a callback after a FanInfluencer record gets created or destroyed that triggers the enqueue method in the class called CCWorker as we configured before. So we need to create this class:

#app/worker/cc_worker.rb
class CCWorker
  def self.enqueue(delay_time, base_class, options)
    enqueuer_class = options[:cache] ? CCDeferredEnqueuer : CCRecalculationEnqueuer
    enqueuer_class.new(delay_time, base_class, options).enqueue_update
  end
end

As you might have noticed, the enqueue method decides which Enqueuer class to delegate the execution to depending on the option ‘cache’. Since the gem is the one that makes the calls to this method with the right arguments, we just need to implement those two modes knowing that we will receive two calls after a fanInfluencer has been created or destroyed: one for recalculation and one for deffered.

We can begin by creating the parent Enqueuer class, that decides if the job needs to be enqueued or if it’s already pending.

# app/worker/cc_enqueuer.rb
class CCEnqueuer
  def initialize(delay_time, base_class, options)
    @options = options
    @delay_time = delay_time
    @base_class = base_class

    @key = form_key
  end

  def enqueue_update
    return if already_queued?
    Delayed::Job.enqueue(self, run_at: @delay_time.from_now)
    mark_as_queued
    Rails.logger.info { "Enqueued #{enqueuer_name}." }
  end

  private

  def already_queued?
    queued = $redis.get(@key)
    queued != nil
  end

  def mark_as_queued
    $redis.set(@key, '1')
  end

  def unmark_as_queued
    $redis.del(@key)
  end
end

We will now implement the deferred mode, for which we will define the ‘execute’ method that will be called by the asyncrhonous worker once the time is up. For this mode the gem comes with a counter class that does the buffering and updating part for us, so we just need to use it.

class CCDeferredEnqueuer < CCEnqueuer
  def perform
    @options[:source_object_class_name] = @base_class.constantize
    counter_class = @options[:counter].constantize
    counter = counter_class.new(nil, @options)
    counter.save!

    unmark_as_queued
    Rails.logger.info { 'Updated counters from cache.' }
  end

  private

  def enqueuer_name
    'deferred'
  end

  def form_key
    class_name = @options[:relation_class_name]
    column_name = @options[:column]
    instance_id = @options[:relation_id]
    "cc::#{class_name}::#{instance_id}::#{column_name}::deferred"
  end
end

For the recalculation mode, we will call the method that we defined in the influencer at the begging, to reset the counter agains the real count. The name of this method comes in the options so we need to invoke it.

class CCRecalculationEnqueuer < CCEnqueuer
  def perform
    class_to_update = @options[:relation_class_name].constantize
    instance_id = @options[:relation_id]
    instance = class_to_update.public_send(:find, instance_id)
    method_name = @options[:method]

    instance.public_send(method_name.to_sym)
    unmark_as_queued
  end

  private

  def enqueuer_name
    'recalculation'
  end

  def form_key
    class_name = @options[:relation_class_name]
    column_name = @options[:column]
    instance_id = @options[:relation_id]
    "cc::#{class_name}::#{instance_id}::#{column_name}::recalculation"
  end
end

And that’s it. Now we can try creating and destroying fans and influencers and we will see how their count changes after the number of seconds that we picked, and a recalculation job is enqueued for later.

To test it, we can create 100 fans that would follow Taylor Swift.

influencer = Influencer.create!(name: 'taylorSwift')
100.times do |i|
  fan = Fan.create!(name: i.to_s)
  FanInfluencer.create!(fan_id: fan.id, influencer_id: influencer.id)
end

For each fanInfluencer created we can see logs like the following, where after the fanInfluencer record is created, two delayed jobs are inserted:

[13] pry(main)> FanInfluencer.create!(fan_id: fan.id, influencer_id: influencer.id)
  SQL (0.9ms)  INSERT INTO "fan_influencers" ("influencer_id", "fan_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"  [["influencer_id", 11], ["fan_id", 1001], ["created_at", "2018-08-01 15:22:57.953303"], ["updated_at", "2018-08-01 15:22:57.953303"]]
  SQL (0.8ms)  INSERT INTO "delayed_jobs" ("handler", "run_at", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"  [["handler", "--- !ruby/object:CCDeferredEnqueuer\noptions:\n  :relation_class_name: Influencer\n  :relation_id: 11\n  :column: :fans_count\n  :method: :update_fans_count\n  :cache: true\n  :counter: Counter::Cache::Counters::BufferCounter\ndelay_time: !ruby/object:ActiveSupport::Duration\n  parts:\n    :seconds: 20\n  value: 20\nbase_class: FanInfluencer\nkey: cc::Influencer::11::fans_count::deferred\n"], ["run_at", "2018-08-01 15:23:17.987544"], ["created_at", "2018-08-01 15:22:57.990042"], ["updated_at", "2018-08-01 15:22:57.990042"]]
Enqueued deferred.
  SQL (0.7ms)  INSERT INTO "delayed_jobs" ("handler", "run_at", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"  [["handler", "--- !ruby/object:CCRecalculationEnqueuer\noptions:\n  :relation_class_name: Influencer\n  :relation_id: 11\n  :column: :fans_count\n  :method: :update_fans_count\n  :cache: false\n  :counter: Counter::Cache::Counters::BufferCounter\ndelay_time: !ruby/object:ActiveSupport::Duration\n  parts:\n    :hours: 5.0\n  value: 18000.0\nbase_class: FanInfluencer\nkey: cc::Influencer::11::fans_count::recalculation\n"], ["run_at", "2018-08-01 20:22:58.010273"], ["created_at", "2018-08-01 15:22:58.012356"], ["updated_at", "2018-08-01 15:22:58.012356"]]
Enqueued recalculation.

You can see in the redis cli console how the new keys are created, with the values 100 in the key for the buffer managed by the gem, and a 1 in the other two keys that mark the corresponding jobs as enqueued so they won’t be enqueued again. In this case ‘104’ that you see in the key name corresponds to the id of the influencer created.

127.0.0.1:6379> keys *
1) "cc:In:104:fans"
2) "cc::Influencer::104::fans_count::recalculation"
3) "cc::Influencer::104::fans_count::deferred"
127.0.0.1:6379> get "cc:In:104:fans"
"100"
127.0.0.1:6379> get "cc::Influencer::104::fans_count::recalculation"
"1"
127.0.0.1:6379> get "cc::Influencer::104::fans_count::deferred"
"1"

After the 20 seconds (or whatever you configured for the env var) have passed, if you turned on your worker, we can see Taylor Swifts fans count has been raised.

[19] pry(main)> Influencer.last.fans_count
  Influencer Load (0.5ms)  SELECT  "influencers".* FROM "influencers" ORDER BY "influencers"."id" DESC LIMIT $1  [["LIMIT", 1]]
=> 100

To test the recalculation mode, we can update the fans count to any other incorrect value and wait until the recalculation method fixes it.

Notice that when using Delayed Jobs, since two jobs need to be enqueued (inserted into the db) after a fanInfluencer has been created, the ‘follow’ action is in total slower than before when no other ‘follow’ action is being performed at the same time. However, under big load the row contention problem does not happen, making this approach more efficient in that case.

Since the solution for the enqueuer does not depend on the influencer model, you can reuse it for any other counter cache that you may want to optimize, just by configuring the counter cache in the associated class and defining the recalculation method.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s