Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB fast deletion best approach

My application currently use MySQL. In order to support very fast deletion, I organize my data in partitions, according to timestamp. Then when data becomes obsolete, I just drop the whole partition.
It works great, and cleaning up my DB doesn't harm my application performance.

I would want to replace MySQL with MongoDB, and I'm wondering if there's something similiar in MongoDB, or would I just need to delete the records one by one (which, I'm afraid, will be really slow and will make my DB busy, and slow down queries response time).

like image 474
danieln Avatar asked Dec 06 '22 07:12

danieln


2 Answers

In MongoDB, if your requirement is to delete data to limit the collection size, you should use a capped collection.

On the other hand, if your requirement is to delete data based on a timestamp, then a TTL index might be exactly what you're looking for.

From official doc regarding capped collections:

Capped collections automatically remove the oldest documents in the collection without requiring scripts or explicit remove operations.

And regarding TTL indexes:

Implemented as a special index type, TTL collections make it possible to store data in MongoDB and have the mongod automatically remove data after a specified period of time.

like image 66
Zaid Masud Avatar answered Jan 08 '23 10:01

Zaid Masud


I thought, even though I am late and an answer has already been accepted, I would add a little more.

The problem with capped collections is that they regularly reside upon one shard in a cluster. Even though, in latter versions of MongoDB, capped collections are shardable they normally are not. Adding to this a capped collection MUST be allocated on the spot, so if you wish to have a long history before clearing the data you might find your collection uses up significantly more space than it should.

TTL is a good answer however it is not as fast as drop(). TTL is basically MongoDB doing the same thing, server-side, that you would do in your application of judging when a row is historical and deleting it. If done excessively it will have a detrimental effect on performance. Not only that but it isn't good at freeing up space to your $freelists which is key to stopping fragmentation in MongoDB.

drop()ing a collection will literally just "drop" the collection on the spot, instantly and gracefully giving that space back to MongoDB (not the OS) giving you absolutely no fragmentation what-so-ever. Not only that but the operation is a lot faster, 90% of the time, than most other alternatives.

So I would stick by my comment:

You could factor the data into time series collections based on how long it takes for data to become historical, then just drop() the collection

Edit

As @Zaid pointed out, even with the _id field capped collections are not shardable.

like image 36
Sammaye Avatar answered Jan 08 '23 10:01

Sammaye