Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb: multiple collections or one big collection w/ index

Tags:

mongodb

I need help modeling my data in mongo. Most my experience has been in relational DBs, I am just starting out w/ mongo. I am modeling data for different events.

  1. Each 'event' with have the same fields.
  2. Each 'event' will have hundreds to millions of documents/rows
  3. Events are dynamic, i.e. new ones will be created as needed. i.e. maybe create a new 'Summer Olympics 2016' event.

Probably most important, when dealing with events (CRUD operations) users will have to specify an event name.

I can see a couple of ways to do this so far and I don't want to make a major mistake in setting up my data model the 'wrong' way.

1) One 'events' collection that has data for all events. Index on 'event' name. Query would look something like:

db.events.find({event: 'Summer Olympics 2012'); {event: 'Summer Olympics 2012', attributes: [{name: 'joe smith', .... } {event: 'Summer Olympics 2012', attributes: [{name: 'jane doe', .... } {event: 'Summer Olympics 2012', attributes: [{name: 'john avery', .... } {event: 'Summer Olympics 2012', attributes: [{name: 'ted williams', .... }  db.events.find({event: 'Summer Olympics 2013'}) {event: 'Summer Olympics 2013', attributes: [{name: 'steve smith', .... } {event: 'Summer Olympics 2013', attributes: [{name: 'amy jones', .... } 

2) A collection for each new event that comes along, w/ collection to keep track of all event names. No index on event name needs as each event is stored in a different collection.

// multiple collections, create new as needed db.summer2012.find() // get summer 2012 docs  db.summer2016.find() // get summer 2016 docs  //'events' collection db.events.find() // get all events that I would have collections for {name: 'summer2012', title: 'Summer Olympics 2012'}; {name: 'summer2016', title: 'Summer Olympics 2016'}; 

For #1 I am a little worried that once I reach 100 events each with millions of records that lookups per 'event' will be slow even if one of the events only has 500 documents.

For #2 Am I 'skirting' the mongo model here by creating a new collection each time and an event comes along?

Any comments/ideas are welcome as I really have no idea which one is going to end up performing better or if one or the other would get me into more trouble down the road. I have looked around (mongo's site included) an I really cannot find a concrete answer.

like image 217
lostintranslation Avatar asked Mar 09 '13 19:03

lostintranslation


People also ask

How many collections is too many MongoDB?

In general, try to limit your replica set to 10,000 collections.

Can a MongoDB database have multiple collections?

2) A collection for each new event that comes along, w/ collection to keep track of all event names. No index on event name needs as each event is stored in a different collection. // multiple collections, create new as needed db.

How many indexes can be created on a collection in MongoDB?

A collection cannot have more than 64 indexes.

Can MongoDB handle millions of data?

Working with MongoDB and ElasticSearch is an accurate decision to process millions of records in real-time. These structures and concepts could be applied to larger datasets and will work extremely well too.


1 Answers

From mongo docs here: data modeling

In certain situations, you might choose to store information in several collections rather than in a single collection.

Consider a sample collection logs that stores log documents for various environment and applications. The logs collection contains documents of the following form:

{ log: "dev", ts: ..., info: ... } { log: "debug", ts: ..., info: ...}

If the total number of documents is low you may group documents into collection by type. For logs, consider maintaining distinct log collections, such as logs.dev and logs.debug. The logs.dev collection would contain only the documents related to the dev environment.

Generally, having large number of collections has no significant performance penalty and results in very good performance. Distinct collections are very important for high-throughput batch processing.

Also spoke w/ 10gen guy. For really large collections he listed multiple benefits for separating out into smaller more specific collections. His comment on using one collection for all the data and using an index was:

Just because you can do something does not mean you should. Model your data appropriately. may be easy to store in one large collection and index but that is not always best approach.

like image 111
lostintranslation Avatar answered Sep 30 '22 18:09

lostintranslation