Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a good idea to generate per day collections in mongodb

Is it a good idea to create per day collections for data on a given day (we could start with per day and then move to per hour if there is too much data). Is there a limit on the number of collections we can create in mongodb, or does it result in performance loss (is it an overhead for mongodb to maintain so many collections). Does a large number of collections have any adverse effect on performance?

To give you more context, the data will be more like facebook feeds, and only the latest data (say last one week or month) is more important to us. Making per day collections keeps the number of documents low, and probably would result in fast access. Even if we need old data, we can fall back to older collections. Does this make sense, or am I heading in the wrong direction?

like image 392
amit_saxena Avatar asked Jun 27 '13 08:06

amit_saxena


2 Answers

what you actually need is to archive the old data. I would suggest you to take a look at this thread at the mongodb mailing list:
https://groups.google.com/forum/#!topic/mongodb-user/rsjQyF9Y2J4 Last post there from Michael Dirolf (10gen)says:

"The OS will handle LRUing out data, so if all of your queries are touching the same portion of data that should stay in memory independently of the total size of the collection."

so I guess you can stay with single collection and good indexes will do the work.
anyhow, if the collection goes too big you can always run manual archive process.

like image 186
Tamir Avatar answered Nov 19 '22 06:11

Tamir


Yes, there is a limit to the number of collections you can make. From the Mongo documentation Abhishek referenced:

The limitation on the number of namespaces is the size of the namespace file divided by 628.

A 16 megabyte namespace file can support approximately 24,000 namespaces. Each index also counts as a namespace.

Indexes etc. are included in the namespaces, but even still, it would take something like 60 years to hit that limit.

However! Have you considered what happens when you want data that spans collections? In other words, if you wanted to know how many users have feeds updated in a week, you're in a bit of a tight spot. It's not easy/trivial to query across collections.

I would recommend instead making one collection to store the data and simply move data out periodically as Tamir recommended. You can easily write a job to move data out of the collection every week or every month.

like image 38
ryan1234 Avatar answered Nov 19 '22 05:11

ryan1234