This question is an offshoot from HABTM associations in Rails : collecting and counting the categories of a model's children.
class Category < ActiveRecord::Base
has_and_belongs_to_many :books
validates_uniqueness_of :name
end
class Book < ActiveRecord::Base
has_and_belongs_to_many :categories
end
class Store < ActiveRecord::Base
has_many :books
has_many :categories, through: :books
end
Given a store, list the number of books per category.
Store.first.books_per_category
desired output:
[ { name: 'mystery', count: 5 }, { name: 'fantasy', count: 6 } ]
However each store may have an enormous amount of books and categories.
I am trying to create a single performant query which only gets the name column and a count of the Books for each distinct Category associated with a Store, without loading books into memory.
class Store < ActiveRecord::Base
# Will load each book into memory
def books_per_category
categories.eager_load(:books).map do |c|
{
name: c.name,
count: c.books.size # Using size instead of count is important since count will always query the DB
}
end
end
# will query books count for each category.
def books_per_category2
categories.distinct.map do |c|
{
name: c.name,
count: c.books.count
}
end
end
end
ActiveRecord::Schema.define(version: 20150508184514) do
create_table "books", force: true do |t|
t.string "title"
t.datetime "created_at"
t.datetime "updated_at"
t.integer "store_id"
end
add_index "books", ["store_id"], name: "index_books_on_store_id"
create_table "books_categories", id: false, force: true do |t|
t.integer "book_id", null: false
t.integer "category_id", null: false
end
add_index "books_categories", ["book_id", "category_id"], name: "index_books_categories_on_book_id_and_category_id"
add_index "books_categories", ["category_id", "book_id"], name: "index_books_categories_on_category_id_and_book_id"
create_table "categories", force: true do |t|
t.string "name"
t.datetime "created_at"
t.datetime "updated_at"
end
create_table "stores", force: true do |t|
t.string "name"
t.datetime "created_at"
t.datetime "updated_at"
end
end
You'll want to be creating a method (or scope) on the Categories object so something like.
Category.joins(:books)
.select('categories.*, COUNT(books.id) as book_count')
.group('categories.id')
the resulting object will now have every attribute of an instance of category and respond to a method, book_count
which returns the number of books with that instances category id.
It is noteworthy to mention this will omit any categories that do not have books associated with them. if you want to include those, the query needs to be updated to the following:
Category.left_outer_joins(:books)
.select('categories.*, COUNT(books_categories.book_id) as book_count')
.group('categories.id')
You can use chain select
and group
to aggregate the count of books for each category. Your books_per_category
method may look like this:
def books_per_category
categories.select('categories.id, categories.name, count(books.id) as count')
.group('categories.id, categories.name').map do |c|
{
name: c.name,
count: c.count
}
end
end
This will produce the following SQL query:
SELECT categories.id, categories.name, count(books.id) as count
FROM "categories"
INNER JOIN "books_categories" ON "categories"."id" = "books_categories"."category_id"
INNER JOIN "books" ON "books_categories"."book_id" = "books"."id"
WHERE "books"."store_id" = 1
GROUP BY categories.id, categories.name
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With