MongoDB fast deletion best approach

Question

My application currently use MySQL. In order to support very fast deletion, I organize my data in partitions, according to timestamp. Then when data becomes obsolete, I just drop the whole partition.
It works great, and cleaning up my DB doesn't harm my application performance.

I would want to replace MySQL with MongoDB, and I'm wondering if there's something similiar in MongoDB, or would I just need to delete the records one by one (which, I'm afraid, will be really slow and will make my DB busy, and slow down queries response time).

Zaid Masud · Accepted Answer

In MongoDB, if your requirement is to delete data to limit the collection size, you should use a capped collection.

On the other hand, if your requirement is to delete data based on a timestamp, then a TTL index might be exactly what you're looking for.

From official doc regarding capped collections:

Capped collections automatically remove the oldest documents in the collection without requiring scripts or explicit remove operations.

And regarding TTL indexes:

Implemented as a special index type, TTL collections make it possible to store data in MongoDB and have the mongod automatically remove data after a specified period of time.

Sammaye · Answer

I thought, even though I am late and an answer has already been accepted, I would add a little more.

The problem with capped collections is that they regularly reside upon one shard in a cluster. Even though, in latter versions of MongoDB, capped collections are shardable they normally are not. Adding to this a capped collection MUST be allocated on the spot, so if you wish to have a long history before clearing the data you might find your collection uses up significantly more space than it should.

TTL is a good answer however it is not as fast as drop(). TTL is basically MongoDB doing the same thing, server-side, that you would do in your application of judging when a row is historical and deleting it. If done excessively it will have a detrimental effect on performance. Not only that but it isn't good at freeing up space to your $freelists which is key to stopping fragmentation in MongoDB.

drop()ing a collection will literally just "drop" the collection on the spot, instantly and gracefully giving that space back to MongoDB (not the OS) giving you absolutely no fragmentation what-so-ever. Not only that but the operation is a lot faster, 90% of the time, than most other alternatives.

So I would stick by my comment:

You could factor the data into time series collections based on how long it takes for data to become historical, then just drop() the collection

Edit

As @Zaid pointed out, even with the _id field capped collections are not shardable.

MongoDB fast deletion best approach

Tags:

mongodb

partitioning

danieln

2 Answers

Zaid Masud

Edit

Sammaye

Recent Activity

Donate For Us

MongoDB fast deletion best approach

Tags:

mongodb

partitioning

danieln

2 Answers

Zaid Masud

Edit

Sammaye

Related questions

Recent Activity

Donate For Us