Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order by count on has_many relation

This is a problem that I am stumbling across frequently. There have been some similar questions about this problem, but none of them was very complete (And they might possibly be outdated since Rails 4 might have introduced new functions that help with this problem)

Let me give a simple example of the problem and the known ways to 'solve' the problem:


Say I have a User Model and a Post Model, and a User has_many :posts

Now, I want to get a top five of the users with the most posts.

The following are options I know, but they all have their own drawbacks:

1)

users = User.all
@top_users = users.sort {|a,b| a.posts.count <=> b.posts.count}.take(5)

Drawbacks: A DataBase request is made for each user, making this solution very slow.

2) Use SQL code directly with a Join (See for instance this question and answer)

select('users.*, COUNT(posts.id) AS posts_count').joins(:posts).group('users.id').order('posts_count DESC').take(5)

This runs all sorting logic in the DataBase. However:

  • We use a lot of DB-specific code (In PostgreSQL for example we would need other syntax). It would be better to use ActiveRecord methods, if possible.
  • Using an Inner Join means that users without any posts will never be returned. This is a problem when we want to return users without posts as well.

3) Use SQL directly with an Outer Join (see for instance this question and answers)

User.select("users.*, COUNT(posts.id) as posts_count").joins("LEFT OUTER JOIN posts ON posts.user_id = users.id").group("posts.id").order("posts_count DESC")

This also returns users without posts. Drawbacks:

  • Even more DB-specific code as #2, and even harder to read.

4) Use a Counter Cache Column (For a full explanation of this technique, see this Railscasts episode)

basically, create a new column on the User that keeps track of the current count of posts for that user by changing the value in the field each time a new post is created or deleted.

This is very fast and readable. The drawback is that we can only use this after we have defined a new field on the User. For many situations this is acceptable, but it will be harder to make flexible because the users table needs to be changed for this to work per association we might want to create a top-five for. Also, as this is a cached field, there are database manipulations that will not trigger an update on the field.

Is there a nicer(readable and efficient) way to accomplish this? Preferable something that uses built-in ActiveRecord methods.

like image 242
Qqwy Avatar asked Aug 10 '15 11:08

Qqwy


1 Answers

Another method, with some limitations that might make it more of a part solution:

User.where(:id => Post.group(:user_id).
                       order("count(*) desc").
                       limit(5).
                       keys)

This would be extremely efficient in database terms at finding the five users with the highest numbers of posts as it only needs to scan an index on the posts table's user_id columns, so would be good for very large data sets. It's also pretty "clean" Rails/ActiveRecord code that ought to be practically database independent.

If returning the Users in their post-count order is critical then a less efficient sorting method could be used once those five have been identified, or the retrieval order of the keys could be used in ruby to sort the returned Users.

like image 168
David Aldridge Avatar answered Oct 28 '22 13:10

David Aldridge