Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete data from MongoDB without slowing it down to a halt?

Everytime we delete larger amounts of data from our MongoDB using collection.remove(), this makes the database so slow that eventually our web servers go down. I believe this is because the remove operation locks the collection for longer periods of time.

We have a query that gives us all the documents we want to delete. However the query does not include a date/time field, so we can't use a TTL index.

Is there a way to remove data in a nice way, freeing the lock from time to time?

like image 866
Bastian Voigt Avatar asked Oct 14 '15 10:10

Bastian Voigt


People also ask

What is the fastest operation to clear an entire collection in MongoDB?

MongoDB's db. collection. drop() is used to drop a collection from the database.

How do I clear my MongoDB database?

Then you can use “show dbs” command for checking the list of Databases. Then select the database you want to delete using command “use databasename“. Then execute db. dropDatabase() command to drop an existing database.

What is soft delete in MongoDB?

Soft delete performs an update process to mark some data as deleted instead of physically deleting it from a table in the database. A common way to implement soft delete is to add a field that will indicate whether data has been deleted or not.

Does removing all collections in a database also remove the database in MongoDB?

Yes, to all.


1 Answers

Using bulk operations

Bulk operations might be of help here. An unordered bulk.find(queryDoc).remove() basically is a version of db.collection.remove(queryDoc) optimized for large numbers of operations. It's usage is pretty straightforward:

var bulk = db.yourCollection.initializeUnorderedBulkOp()
bulk.find(yourQuery).remove()
bulk.execute()

Please see Bulk.find().remove() in the MongoDB docs for details.

The idea behind this approach is not to speed up the removal, but to produce less load. In my tests, the load was reduced by half and took slightly less time than a db.collection.remove(query).

Creating an index

However, a remove operations should not stale your instance to a point of freezing. I tested the removal of 12M documents on my 5 year old MacBook and while it put some load on it, it was far away from freezing, and took some 10 minutes. However, the field I used to query was indexed.

This leads me to the conclusion that probably you might be experiencing a collection scan. If I am right, here is what happens: Your query contains fields or a combination of fields not contained in an index or for which an index intersection can not be constructed. This forces the mongod in question to find, access and read those fields for each and every document in the database from disk.

So, it might be helpful to create an index containing each field in your query in background prior to the remove operation, however counterintuitive this is.

db.collection.createIndex(
  {firstFieldYouQueryBy:1,...,NthFieldYouQueryBy:1},
  {background:true}
)

Albeit this operation will be done in background, the shell will block. This might take a while. You can see the status by opening a second shell and use:

db.currentOp()

(You'll have to search a bit).

When the index is created (which you can check by using db.collection.getIndices()), your removal operations should be more efficient and hence faster. After the mass removal is done, you can of course delete the index, if not needed otherwise.

With an index, your prevent a collection scan, thereby speeding up the removal considerably.

Combining both approaches

It should be obvious that it makes sense to create the index first and issue the bulk command after the index is ready.

like image 191
Markus W Mahlberg Avatar answered Nov 01 '22 10:11

Markus W Mahlberg