Everytime we delete larger amounts of data from our MongoDB using collection.remove()
, this makes the database so slow that eventually our web servers go down. I believe this is because the remove operation locks the collection for longer periods of time.
We have a query that gives us all the documents we want to delete. However the query does not include a date/time field, so we can't use a TTL index.
Is there a way to remove data in a nice
way, freeing the lock from time to time?
MongoDB's db. collection. drop() is used to drop a collection from the database.
Then you can use “show dbs” command for checking the list of Databases. Then select the database you want to delete using command “use databasename“. Then execute db. dropDatabase() command to drop an existing database.
Soft delete performs an update process to mark some data as deleted instead of physically deleting it from a table in the database. A common way to implement soft delete is to add a field that will indicate whether data has been deleted or not.
Yes, to all.
Bulk operations might be of help here. An unordered bulk.find(queryDoc).remove()
basically is a version of db.collection.remove(queryDoc)
optimized for large numbers of operations. It's usage is pretty straightforward:
var bulk = db.yourCollection.initializeUnorderedBulkOp()
bulk.find(yourQuery).remove()
bulk.execute()
Please see Bulk.find().remove() in the MongoDB docs for details.
The idea behind this approach is not to speed up the removal, but to produce less load. In my tests, the load was reduced by half and took slightly less time than a db.collection.remove(query)
.
However, a remove operations should not stale your instance to a point of freezing. I tested the removal of 12M documents on my 5 year old MacBook and while it put some load on it, it was far away from freezing, and took some 10 minutes. However, the field I used to query was indexed.
This leads me to the conclusion that probably you might be experiencing a collection scan. If I am right, here is what happens: Your query contains fields or a combination of fields not contained in an index or for which an index intersection can not be constructed. This forces the mongod in question to find, access and read those fields for each and every document in the database from disk.
So, it might be helpful to create an index containing each field in your query in background prior to the remove operation, however counterintuitive this is.
db.collection.createIndex(
{firstFieldYouQueryBy:1,...,NthFieldYouQueryBy:1},
{background:true}
)
Albeit this operation will be done in background, the shell will block. This might take a while. You can see the status by opening a second shell and use:
db.currentOp()
(You'll have to search a bit).
When the index is created (which you can check by using db.collection.getIndices()
), your removal operations should be more efficient and hence faster. After the mass removal is done, you can of course delete the index, if not needed otherwise.
With an index, your prevent a collection scan, thereby speeding up the removal considerably.
It should be obvious that it makes sense to create the index first and issue the bulk command after the index is ready.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With