Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is ActiveRecord::Batches::find_each thread safe?

Reference: http://api.rubyonrails.org/classes/ActiveRecord/Batches.html.

Is the implementation of find_each thread-safe? In other words, can I do something like

count = 0
MyModel.find_each do |model|
    count += 1 if model.foo?
end

And expect it to be thread-safe?

like image 884
franklsf95 Avatar asked Apr 07 '15 21:04

franklsf95


1 Answers

This question has been left unanswered for a while. I think it is a very good question, as problems of thread-safety like this one can be harmful to an application's integrity and since Rails feels so magical, it's good to always have a look under the hood and understand what's going on.

This method (find_each) would not be thread-safe in the specified case, if the state of the data can be changed during the execution of the code and affect the outcome. (e.g. block being called with removed data, block being called twice with the same data, and block skipping part of the data).

In summary, find_each is not thread-safe. It doesn't do any locking, so it does not ensure that the data has been removed, updated, inserted or moved by the time it calls the block. The only thing it ensures is that the block won't be called for the same primary index twice.

Here's an example where it could yield a weird result (stupid case though). Let's assume the following Account table:

|id|balance|
| 1|   1000|
| 2|    500|
| 3|   2000|

And the following code (let's use batch_size: 1 since it's such a small table):

total = 0
Account.find_in_batches(batch_size: 1) |acc|
   total += acc.balance
end

On the first iteration, it would run the block with Account(id: 1, balance: 1000), hence total would equal 1000. Now, while the second iteration is being run, another thread runs the following code:

Account.transaction do
    acc1 = Account.find(1).lock!
    acc3 = Account.find(3).lock!
    acc1.update(balance: acc1.balance + acc3.balance)
    acc3.update(balance: 0)
end

It basically transfers everything from account 1 to account 3. Now the table would look like:

|id|balance|
| 1|   3000|
| 2|    500|
| 3|      0|

But remember that we have already processed the first account, and it would keep running the block with the second account, so total would equal 1500, and then finally run the block for the third account, since the balance is now 0, the total would remain at 1500. This would cause you to have the total at 1500 when you're clearly trying to have it at 3500.

(each is not completely thread-safe, but it guarantees this case)


If you need to ensure thread safety, a simple way is to get a lock on the table (for example, in postgres). Remember that getting a lock on the whole table can impact your performance a lot.

count = 0
MyModel.transaction do
   ActiveRecord::Base.connection.execute("LOCK TABLE mymodels SHARE")
   MyModel.find_each do |model|
       count += 1 if model.foo?
   end
end

Note that MyModel.lock.find_each is not thread-safe either.


find_each works by ordering everything by the primary index (usually id), and limiting the result with batch size (default is 1000).

SELECT  "models".* FROM "models" WHERE "models"."id" > 1000) ORDER BY "models"."id" ASC LIMIT $1

It stores the last id in the batch, then calls the block for each row. Once the block is executed for every row, it runs another query with models.id > last_id, until it reaches the end.

like image 70
Alex Santos Avatar answered Sep 22 '22 14:09

Alex Santos