Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the fastest way to remove MongoDB documents by Date?

In our company we have a retention of eight days of data (with one million of records aprox.) so we have a cronjob that remove documents older than eight days each day. Now we're using the Published field and this field is not indexed.

It takes like 15 minutes to finish to get rid off 100.000 records and we found that this operation is too long.

This is the query where 'docs' is a variable with an array of documents that we don't want to remove. The 'theDate' variable is the date of eight days ago.

records.remove( { "Published" : { $lte : theDate }, "_id" : { $nin : docs }  }

Would be it better to use the _id field, which is indexed, in ordered to do this operation? How can we use the _id field in order to do the same operation?

like image 442
midudev Avatar asked Mar 15 '23 11:03

midudev


1 Answers

Discard the Cron job entirely: this is a job for TTL indexes. http://docs.mongodb.org/manual/core/index-ttl/

Create a TTL index on the Published field with expireAfterSeconds: 691200 and watch as your documents are automatically removed 8 days after publication.

And if you don't want to indiscriminately delete all documents 8 days after their publication, keep your Cron job and just create a plain index on the Published field.

like image 129
Max Noel Avatar answered Mar 18 '23 14:03

Max Noel