June 4, 2020

Rails N + 2 queries

We usually try to solve performance problems by using [.c-inline-code]#includes[.c-inline-code] to get rid of N + 1 queries but this doesn't always fix the issue; in fact, it can even create more queries under some circumstances. This post shows a few examples of when that happens and how to deal with it.

Your typical N + 1 query problem

We learn how to deal with ActiveRecord performance issues with very plain examples, most blog posts show models that look something like this:

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=basic_domain.rb

Followed by your standard [.c-inline-code]irb[.c-inline-code] example:

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.comments }

Which generates the following queries, where we can clearly see the N + 1 query problem: Rails is doing one query to retrieve the posts and N additional queries to retrieve the comments, where N is the number of posts.

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=n_1_simple.sql

And that the solution is to preload the data by calling the [.c-inline-code]#includes[.c-inline-code] method over the posts collection:

irb> posts = Post.includes(:comments)
irb> all_comments = posts.map { |p| p.comments }

Which in turn produces the following optimized queries:

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=n_1_fixed.sql

And so we start using [.c-inline-code]#includes[.c-inline-code] on all our ActiveRecord queries and move on.

A little ways down the road: changes to queries invalidate data preloading

Time passes and requirements change, it's only normal. Now we need to only show posts that are uncensored and we change our query to reflect that:

irb> posts = Post.includes(:comments)
irb> all_comments = posts.map { |p| p.comments.where(censored: false) }

Can you guess the amount of queries this is going to generate? I'll give you a hint: it's on the name of the post.

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=n_1_custom.sql

Now we have the N + 1 queries we had in the beginning and also an additional query to preload data we are not going to use (which will increase our application's memory footprint but that's a story for another day). You may wonder why doesn't Rails just iterate the comments collection and select posts that are not censored. Well, in this simple example, it could. But if we start doing some more complex queries and adding raw SQL fragments it's going to get a lot more difficult.

Adding Bullet to log N + 1 queries

I agree that Bullet is a must have in any project, no matter how big or small. It's very difficult to catch every N + 1 in every query we build and it's even harder to detect cases like this where we should no longer preload the data.

After installing bullet what we get is the following warning:

AVOID eager loading detected
  Post => [:comments]
  Remove from your query: .includes([:comments])

Awesome! It knows about our unused preload and it tells us to remove it; let's do that.

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.comments.where(censored: false) }

And let's also check our logs for the generated queries and watch out for warnings from bullet.

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=n_1_custom_includes.sql

So we are back to square one but this time we have no bullet warnings, so what should we do?

What are the solutions to preloading custom queries?

There are two solutions (at least that I know of) to this performance problem. The first one is to use Rails' preloader but as you may guess from the [.c-inline-code]:nodoc:[.c-inline-code] directive that's a private class not meant to be used outside the framework. I'm not going to even talk about how to preload using that class but if you are curious here's a nice post on how to deal with N + 1 queries on GraphQL using Rails' preloader.

The second solution, and the one I'm going to explain here, is using a scoped association and preloading it instead of the [.c-inline-code]comments[.c-inline-code] association. This requires us to add one more line to our Posts model:

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=domain_with_association.rb

And change our code to get the comments using the association:

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.uncensored_comments }

Bingo! We get the following warning from bullet:

GET /posts
USE eager loading detected
  Post => [:uncensored_comments]
  Add to your query: .includes([:uncensored_comments])

And sure enough if we add that preload

irb> posts = Post.includes(:uncensored_comments)
irb> all_comments = posts.map { |p| p.uncensored_comments }

We no longer get the warning and our queries are optimized:

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=n_1_custom_solved.sql

Caveats: Evaluate performance optimizations

As with most performance optimizations you should really measure and evaluate the changes you are about to make. It doesn't really make sense to add an association to your models every time you want to preload, sometimes it is better to have a small performance penalty rather than a model full of associations.

But if your queries are taking too long I really encourage you to add the corresponding associations and preload the data you need.

Bonus: how to preload belongs_to associations

This method also works for cases when you need to just fetch one record. Let's use the blog example and add a use case where we need the most liked comment from each Post:

irb> posts = Post.includes(:comments)
irb> most_liked_comments = posts.map { |p| p.comments.order(likes: :desc).first }

Once again we have an N + 1 and no warning from bullet

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=n_1_belongs_to.sql

And here is where [.c-inline-code]has_one[.c-inline-code] comes to the rescue, let's modify our Posts model one last time

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=domain_belongs_to.rb

Bullet now complains about data not being preload us and gives us the solution to our problems

irb> posts = Post.includes(:most_liked_comment)
irb> most_liked_comments = posts.map { |p| p.most_liked_comment }

And we get neat SQL queries once more

CODE: https://gist.github.com/brunvez/0b07f828da222441ffc9a5c44ef30f1e.js?file=n_1_belongs_to_solved.sql

Notice though that in this case we don't get the [.c-inline-code]LIMIT[.c-inline-code] clause on the SQL query and so Rails loads all these comments on memory and then loads them on each post, that's also something to consider: memory usage vs SQL query time. As in most cases there's not a clear answer and you should really measure to see your specific case.


Bruno Vezoli

How can we help you?