Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I run updates in batches in Rails 3/4?

I need to mass-update many thousands of records, and I would like to process the updates in batches. First, I tried:

Foo.where(bar: 'bar').find_in_batches.update_all(bar: 'baz')

...which I was hoping would generate SQL such as:

"UPDATE foo SET bar = 'baz' where bar='bar' AND id > (whatever id is passed in by find_in_batches)"

That doesn't work because find_in_batches returns an array, while update_all needs an ActiveRecord relation.

This is what I tried next:

Foo.where(bar: 'bar').select('id').find_in_batches do |foos|
  ids = foos.map(&:id)
  Foo.where(id: ids).update_all(bar: 'baz')
end

That works, but it obviously runs a select followed by the update, rather than a single update based on my 'where' conditions. Is there any way to clean this up, so that the select and update don't have to be separate queries?

like image 210
MothOnMars Avatar asked Apr 23 '14 18:04

MothOnMars


People also ask

What is batch size in rails?

The size of each batch is set by the :batch_size option; the default is 1000. You can control the starting point for the batch processing by supplying the :start option. This is especially useful if you want multiple workers dealing with the same processing queue.

Does update call Save rails?

update!(attributes) LinkUpdates its receiver just like update but calls save! instead of save, so an exception is raised if the record is invalid. Also aliased as: update_attributes!


4 Answers

In Rails 5, there's a new handy method ActiveRecord::Relation#in_batches to solve this problem:

Foo.in_batches.update_all(bar: 'baz')

Check documentation for details.

like image 125
dlackty Avatar answered Oct 21 '22 22:10

dlackty


I'm surprised, too, that there isn't an easier way to do this... but I did come up with this approach:

batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
  Foo.where(bar: 'bar').order(:id)
                       .offset(offset)
                       .limit(batch_size)
                       .update_all(bar: 'baz')
end

Basically this will:

  1. Create an array of offsets between 0 and Foo.count stepping by batch_size each time. For example, if Foo.count == 10500 you'd get: [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
  2. Loop through these numbers and use them as an OFFSET in the SQL query, being sure to order by id, and limiting to the batch_size.
  3. Update at most batch_size records whose "index" is greater than offset.

This is basically the manual way to perform what you said you were hoping for in the generated SQL. Too bad it can't just be done this way already by a standard library method... though I'm sure you could create one of your own.

like image 24
pdobb Avatar answered Oct 21 '22 23:10

pdobb


This is 2 years late, but the answers here are a) very slow for large data sets and b) ignore the builtin rails capabilities (http://api.rubyonrails.org/classes/ActiveRecord/Batches.html).

As the offset value increases, depending on your DB server, it will do a sequence scan until it reaches your block, and then fetches the data for processing. As your offset gets into the millions, this will be extremely slow.

use the "find_each" iterator method:

Foo.where(a: b).find_each do |bar|
   bar.x = y
   bar.save
end

This has the added benefit of running the model callbacks with each save. If you don't care for the callbacks, then try:

Foo.where(a: b).find_in_batches do |array_of_foo|
  ids = array_of_foo.collect &:id
  Foo.where(id: ids).update_all(x: y)
end
like image 40
Faisal Avatar answered Oct 21 '22 22:10

Faisal


pdobb's answer is on the right track, but didn't work for me in Rails 3.2.21 because of this issue of ActiveRecord not parsing OFFSET with UPDATE calls:

https://github.com/rails/rails/issues/10849

I modified the code accordingly and it worked fine for concurrently setting the default value on my Postgres table:

batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
  Foo.where('id > ? AND id <= ?', offset, offset + batch_size).
      order(:id).
      update_all(foo: 'bar')
end
like image 4
Charlie Tran Avatar answered Oct 21 '22 22:10

Charlie Tran