In our company we have a retention of eight days of data (with one million of records aprox.) so we have a cronjob that remove documents older than eight days each day. Now we're using the Published field and this field is not indexed.
It takes like 15 minutes to finish to get rid off 100.000 records and we found that this operation is too long.
This is the query where 'docs' is a variable with an array of documents that we don't want to remove. The 'theDate' variable is the date of eight days ago.
records.remove( { "Published" : { $lte : theDate }, "_id" : { $nin : docs } }
Would be it better to use the _id field, which is indexed, in ordered to do this operation? How can we use the _id field in order to do the same operation?
Discard the Cron job entirely: this is a job for TTL indexes. http://docs.mongodb.org/manual/core/index-ttl/
Create a TTL index on the Published
field with expireAfterSeconds: 691200
and watch as your documents are automatically removed 8 days after publication.
And if you don't want to indiscriminately delete all documents 8 days after their publication, keep your Cron job and just create a plain index on the Published
field.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With