I am trying to write a function that groups by some columns in a very large table (millions of rows). Is there any way to get find_each
to work with this, or is it impossible given that I do not want to order by the id column?
The SQL of my query is:
SELECT derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;
The rails find_each
automatically adds the ORDER BY clause using a reorder
statement. I have tried changing the SQL to:
SELECT MAX(id) AS "mytable"."id", derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;
but that doesn't work either. Any ideas other than writing my own find_each function or overriding the private batch_order
function in batches.rb?
There are at least two approaches to solve this problem:
I. Use subquery:
# query the table and select id, derivable_type and derivable_id
my_table_ids = MyTable
.group("derivable_type, derivable_id")
.select("MAX(id) AS my_table_id, derivable_type, derivable_id")
# use subquery to allow rails to use ORDER BY in find_each
MyTable
.where(id: my_table_ids.select('my_table_id'))
.find_each { |row| do_something(row) }
II. Write custom find_each function
rows = MyTable
.group("derivable_type, derivable_id")
.select("derivable_type, derivable_id")
find_each_grouped(rows, ['derivable_type', 'derivable_id']) do |row|
do_something(row)
end
def find_each_grouped(rows, columns, &block)
offset = 0
batch_size = 1_000
loop do
batch = rows
.order(columns)
.offset(offset)
.limit(limit)
batch.each(&block)
break if batch.size < limit
offset += limit
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With