I'd like to delete a large number of old documents from one collection and so it makes sense to use the bulk api. Deleting them is as simple as:
var bulk = db.myCollection.initializeUnorderedBulkOp();
bulk.find({
_id: {
$lt: oldestAllowedId
}
}).remove();
bulk.execute();
The only problem is this will attempt to delete every single document matching this criteria and in this case that is millions of documents, so for performance reasons I don't want to delete them all at once. I want to enforce a limit on the operation so that I can do something like bulk.limit(10000).execute();
and space the operations out by a few seconds to prevent locking the database for longer than necessary. However I have been unable to find any options that can be passed to bulk for limiting the number it executes.
Is there a way to limit bulk operations in this manner?
Before anyone mentions it, I know that bulk will split operations into 1000 document chunks automatically, but it will still execute all of those operations sequentially as fast as it can. This results in a much larger performance impact than I can deal with right now.
You can iterate the array of _id
that of those documents that match your query using the .forEach
method. The best way to return that array is by using the .distinct()
method. You then use "bulk" operations to remove your documents.
var bulk = db.myCollection.initializeUnorderedBulkOp();
var count = 0;
var ids = db.myCollection.distinct('_id', { '_id': { '$lt': oldestAllowedId } } );
ids.forEach(function(id) {
bulk.find( { '_id': id } ).removeOne();
count++;
if (count % 1000 === 0) {
// Execute per 1000 operations and re-init
bulk.execute();
// Here you can sleep for a while
bulk = db.myCollection.initializeUnorderedBulkOp();
}
});
// clean up queues
if (count > 0 ) {
bulk.execute();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With