I need to mass-update many thousands of records, and I would like to process the updates in batches. First, I tried:
Foo.where(bar: 'bar').find_in_batches.update_all(bar: 'baz')
...which I was hoping would generate SQL such as:
"UPDATE foo SET bar = 'baz' where bar='bar' AND id > (whatever id is passed in by find_in_batches)"
That doesn't work because find_in_batches returns an array, while update_all needs an ActiveRecord relation.
This is what I tried next:
Foo.where(bar: 'bar').select('id').find_in_batches do |foos|
ids = foos.map(&:id)
Foo.where(id: ids).update_all(bar: 'baz')
end
That works, but it obviously runs a select followed by the update, rather than a single update based on my 'where' conditions. Is there any way to clean this up, so that the select and update don't have to be separate queries?
The size of each batch is set by the :batch_size option; the default is 1000. You can control the starting point for the batch processing by supplying the :start option. This is especially useful if you want multiple workers dealing with the same processing queue.
update!(attributes) LinkUpdates its receiver just like update but calls save! instead of save, so an exception is raised if the record is invalid. Also aliased as: update_attributes!
In Rails 5, there's a new handy method ActiveRecord::Relation#in_batches
to solve this problem:
Foo.in_batches.update_all(bar: 'baz')
Check documentation for details.
I'm surprised, too, that there isn't an easier way to do this... but I did come up with this approach:
batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
Foo.where(bar: 'bar').order(:id)
.offset(offset)
.limit(batch_size)
.update_all(bar: 'baz')
end
Basically this will:
0
and Foo.count
stepping by batch_size
each time. For example, if Foo.count == 10500
you'd get: [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
id
, and limiting to the batch_size
.batch_size
records whose "index" is greater than offset
.This is basically the manual way to perform what you said you were hoping for in the generated SQL. Too bad it can't just be done this way already by a standard library method... though I'm sure you could create one of your own.
This is 2 years late, but the answers here are a) very slow for large data sets and b) ignore the builtin rails capabilities (http://api.rubyonrails.org/classes/ActiveRecord/Batches.html).
As the offset value increases, depending on your DB server, it will do a sequence scan until it reaches your block, and then fetches the data for processing. As your offset gets into the millions, this will be extremely slow.
use the "find_each" iterator method:
Foo.where(a: b).find_each do |bar|
bar.x = y
bar.save
end
This has the added benefit of running the model callbacks with each save. If you don't care for the callbacks, then try:
Foo.where(a: b).find_in_batches do |array_of_foo|
ids = array_of_foo.collect &:id
Foo.where(id: ids).update_all(x: y)
end
pdobb's answer is on the right track, but didn't work for me in Rails 3.2.21 because of this issue of ActiveRecord not parsing OFFSET with UPDATE calls:
https://github.com/rails/rails/issues/10849
I modified the code accordingly and it worked fine for concurrently setting the default value on my Postgres table:
batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
Foo.where('id > ? AND id <= ?', offset, offset + batch_size).
order(:id).
update_all(foo: 'bar')
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With