Reference: http://api.rubyonrails.org/classes/ActiveRecord/Batches.html.
Is the implementation of find_each
thread-safe? In other words, can I do something like
count = 0
MyModel.find_each do |model|
count += 1 if model.foo?
end
And expect it to be thread-safe?
This question has been left unanswered for a while. I think it is a very good question, as problems of thread-safety like this one can be harmful to an application's integrity and since Rails feels so magical, it's good to always have a look under the hood and understand what's going on.
This method (find_each
) would not be thread-safe in the specified case, if the state of the data can be changed during the execution of the code and affect the outcome. (e.g. block being called with removed data, block being called twice with the same data, and block skipping part of the data).
In summary, find_each
is not thread-safe. It doesn't do any locking, so it does not ensure that the data has been removed, updated, inserted or moved by the time it calls the block. The only thing it ensures is that the block won't be called for the same primary index twice.
Here's an example where it could yield a weird result (stupid case though). Let's assume the following Account
table:
|id|balance|
| 1| 1000|
| 2| 500|
| 3| 2000|
And the following code (let's use batch_size: 1
since it's such a small table):
total = 0
Account.find_in_batches(batch_size: 1) |acc|
total += acc.balance
end
On the first iteration, it would run the block with Account(id: 1, balance: 1000)
, hence total
would equal 1000
.
Now, while the second iteration is being run, another thread runs the following code:
Account.transaction do
acc1 = Account.find(1).lock!
acc3 = Account.find(3).lock!
acc1.update(balance: acc1.balance + acc3.balance)
acc3.update(balance: 0)
end
It basically transfers everything from account 1 to account 3. Now the table would look like:
|id|balance|
| 1| 3000|
| 2| 500|
| 3| 0|
But remember that we have already processed the first account, and it would keep running the block with the second account, so total
would equal 1500
, and then finally run the block for the third account, since the balance is now 0
, the total
would remain at 1500
.
This would cause you to have the total
at 1500
when you're clearly trying to have it at 3500
.
(each
is not completely thread-safe, but it guarantees this case)
If you need to ensure thread safety, a simple way is to get a lock on the table (for example, in postgres). Remember that getting a lock on the whole table can impact your performance a lot.
count = 0
MyModel.transaction do
ActiveRecord::Base.connection.execute("LOCK TABLE mymodels SHARE")
MyModel.find_each do |model|
count += 1 if model.foo?
end
end
Note that MyModel.lock.find_each
is not thread-safe either.
find_each
works by ordering everything by the primary index (usually id
), and limiting the result with batch size (default is 1000
).
SELECT "models".* FROM "models" WHERE "models"."id" > 1000) ORDER BY "models"."id" ASC LIMIT $1
It stores the last id in the batch, then calls the block for each row. Once the block is executed for every row, it runs another query with models.id > last_id
, until it reaches the end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With