Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you do a group by with find_each in rails?

I am trying to write a function that groups by some columns in a very large table (millions of rows). Is there any way to get find_each to work with this, or is it impossible given that I do not want to order by the id column?

The SQL of my query is: SELECT derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;

The rails find_each automatically adds the ORDER BY clause using a reorder statement. I have tried changing the SQL to:

SELECT MAX(id) AS "mytable"."id", derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;

but that doesn't work either. Any ideas other than writing my own find_each function or overriding the private batch_order function in batches.rb?

like image 445
Julie Avatar asked Nov 08 '22 19:11

Julie


1 Answers

There are at least two approaches to solve this problem:

I. Use subquery:

# query the table and select id, derivable_type and derivable_id
my_table_ids = MyTable
  .group("derivable_type, derivable_id")
  .select("MAX(id) AS my_table_id, derivable_type, derivable_id")

# use subquery to allow rails to use ORDER BY in find_each 
MyTable
  .where(id: my_table_ids.select('my_table_id'))
  .find_each { |row| do_something(row) } 

II. Write custom find_each function

rows = MyTable
  .group("derivable_type, derivable_id")
  .select("derivable_type, derivable_id")

find_each_grouped(rows, ['derivable_type', 'derivable_id']) do |row| 
  do_something(row)
end

def find_each_grouped(rows, columns, &block)
  offset = 0
  batch_size = 1_000
  loop do
    batch = rows
      .order(columns)
      .offset(offset)
      .limit(limit)

    batch.each(&block)

    break if batch.size < limit
    offset += limit
  end
end
like image 58
Hirurg103 Avatar answered Nov 15 '22 09:11

Hirurg103