I use mongodb for storing 30 day data which come to me as a stream. I am searching for a purging mechanism by which I can throw away oldest data to create room for new data. I used to use mysql in which I handled this situation using partitions. I kept 30 partitions which are date based. I delete the oldest dated partition and created a new partition to hold new data.
When I map the same thing in mongodb, I feel like using a date based 'shards'. But the problem is that it makes my data distribution bad. If all the new data are in the same shard, then that shard will be so hot as there are lot of people accessing them and the shards containing older data will be less loaded by users.
I can have a collection based purging. I can have 30 collections and I can throw away the oldest collection to accommodate new data. But couple of problems are 1) If I make collections smaller then I cannot benefit much from sharding as they are done per collection. 2) My queries have to change to query from all 30 collections and take an union.
Please suggest me a good purging mechanism (if any) to handle this situation.
There are really only three ways to do purging in MongoDB. It looks like you've already identified several of the trade-offs.
Option #1: single collection
pros
cons
Option #2: collection per day
pros
collection.drop()
is very fast.cons
Option #3: database per day
pros
cons
Now there is an option #4, but it is not a general solution. I know of some people who did "purging" by simply using Capped Collections. There are definitely cases where this works, but it has a bunch of caveats, so you really need to know what you're doing.
we can set TTL for collection from mongodb 2.2 release or higher. this will help you to expire old data from collection.
Follow this link: http://docs.mongodb.org/manual/tutorial/expire-data/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With