Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ActiveRecord find_in_batches method for deleting large data

Okay, so I know that when dealing with very large data, we can use find_in_batches which, as I understand it, does the work of Model.all.each in a really fast way which is much more efficient

Now, I have a really large data I'm trying to delete, and I'm thinking of using this same find_in_batches to delete them in batches.

below is what I have (from within the rake task database.rake):

old_messages = TextMessage.where("created_at < ?", number.days.ago )

old_messages.find_in_batches do |batch|
  batch.delete_all
end

When I run this however, I get the following error:

ArgumentError: wrong number of arguments (0 for 1..3)
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/httparty-0.13.5/lib/httparty.rb:66:in `logger'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation/delegation.rb:94:in `public_send'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation/delegation.rb:94:in `block in method_missing'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation.rb:302:in `scoping'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation/delegation.rb:94:in `method_missing'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation/batches.rb:112:in `find_in_batches'
/Users/Sunday/workspace/resilience/lib/tasks/database.rake:18:in `block (2 levels) in <top (required)>'

where database.rake on line 18 is: old_messages.find_in_batches do |batch|.

So, is it possible to use find_in_batches for deleting like I'm trying to do? and what am I doing wrong, if the answer to that is yes. Thanks for all response.

like image 715
x6iae Avatar asked Oct 08 '15 17:10

x6iae


4 Answers

I don't think anyone has answered your question.

To answer 'what you are doing wrong' and can you use 'find_in_batches' in that way:

The reason why 'delete_all' does not work, is because 'delete_all' only works on activerecord relations. When you use 'find_in_batches' the variable 'batch' is now just a normal array, which may have it's own 'delete_all' method that is different,

You may need 'find_in_batches' incase if you have thousands of records to be deleted. So the previous answer is incorrect. (It may lead to memory exceeded exceptions and timeouts)

Note that is not related to the original error you displayed, but you cannot use 'batch' with 'delete_all' because 'batch' is an array and 'delete_all' is for activerecords

How to delete using find_in_batches

I was having a similar problem

user.posts.destroy_all

was overloading the server because of thousands of posts (this is an example my actual model was not 'posts')

You can use

user.posts.select(:id).find_in_batches(batch_size: 100) do |ids|
  Post.where(id: ids).delete_all
end

If it was one sql call, it will try to store all the delete items in memory at once that can break the server, This will have a manageable size of sql calls.

like image 186
sonnyhe2002 Avatar answered Oct 16 '22 19:10

sonnyhe2002


New functionality: #in_batches (https://api.rubyonrails.org/classes/ActiveRecord/Batches.html#method-i-in_batches)

Person.where("age > 21").in_batches do |relation|
  relation.delete_all
end
like image 42
kwerle Avatar answered Oct 16 '22 19:10

kwerle


Use Model.in_batches.destroy_all.

Be mindful of dependent associations which might still be loaded without batching. I created a gem to fix this: batch_dependent_associations

See also dhh's issue in Rails: Relation#destroy_all should perform its work in batches

like image 25
thisismydesign Avatar answered Oct 16 '22 19:10

thisismydesign


Using in_batches or find_in_batches with delete_all, like other answers suggest, will result in two queries per batch; one SELECT query to fetch the records and one DELETE query to delete the records.

A way to do batched deletes without fetching records could be:

def delete_batched(records, batch_size: 1000)
  loop do
    amount_deleted = records.reorder(:id).limit(batch_size).delete_all
    break if amount_deleted < batch_size
  end
end

delete_batched(TextMessage.where("created_at < ?", number.days.ago))

(It is my understanding that a deterministic processing order is desirable when doing deletes like this, hence the ordering by :id. Using reorder will replace any already set order)

like image 30
Moeghoeg Avatar answered Oct 16 '22 19:10

Moeghoeg