I am designing a system with MongoDb (64 bit version) to handle a large amount of users (around 100,000) and each user will have large amounts of data (around 1 million records). What is the best strategy of design? <ol> <li>Dump all records in single collection</li> <li>Have a collection for each user</li> <li>Have a database for each user.</li> </ol> Many Thanks,

So you are looking for 100,000,000 detail records overall for 100K users? What many people don't seem to understand is that MongoDB is good at horizontal scaling. Horizontal scaling is normally classed as scaling huge single collections of data across many (many) servers in a huge cluster. So already if you use a single collection for common data (i.e. one collection called <code>user</code> and one called <code>detail</code>) you are suiting MongoDBs core purpose and build. MongoDB, as mentioned, by others is not so good at scaling vertically across many collections. It has a nssize limit to begin with and even though 12K initial collections is estimated in reality due to index size you can have as little as 5K collections in your database. So a collection per user is not feasible at all. It would be using MongoDB against its core principles. Having a database per user involves the same problems, maybe more, as having singular collections per user. I have never encountered some one not being able to scale MongoDB to the billions or even close to the 100s of billions (or maybe beyond) on a optimised set-up, however, I do not see why it cannot; after all Facebook is able to make MySQL scale into the 100s of billions per user (across 32K+ shards) for them and the sharding concept is similar between the two databases. So the theory and possibility of doing this is there. It is all about choosing the right schema and shard concept and key (and severs and network etc etc etc etc). If you were to witness problems you could go for splitting archive collections, or deleted items away from the main collection but I think that is overkill, instead you want to make sure that MongoDB knows where each segment of your huge dataset is at any given point in time on the master and ensure that this data is always hot, that way queries that don't do a global and scatter OP should be quite fast.

MongoDb Database vs Collection

2 Answers

So you're looking at somewhere in the region of 100 billion records (1 million records * 100,000 users).

The preferred way to deal with large amounts of data is to create a sharded cluster that splits the data out over several servers that are presented as single logical unit via the mongo client.

Therefore the answer to your question is put all your records in a single sharded collection.

The number of shards required and configuration of the cluster is related to the size of the data and other factors such as the quantity and distribution of reads and writes. The answers to those questions are probably very specific to your unique situation, so I won't attempt to guess them.

I'd probably start by deciding how many shards you have the time and machines available to set up and testing the system on a cluster of that many machines. Based on the performance of that, you can decide whether you need more or fewer shards in your cluster

107

answered Sep 29 '22 06:09

chrisbunney

So you are looking for 100,000,000 detail records overall for 100K users?

What many people don't seem to understand is that MongoDB is good at horizontal scaling. Horizontal scaling is normally classed as scaling huge single collections of data across many (many) servers in a huge cluster.

So already if you use a single collection for common data (i.e. one collection called user and one called detail) you are suiting MongoDBs core purpose and build.

MongoDB, as mentioned, by others is not so good at scaling vertically across many collections. It has a nssize limit to begin with and even though 12K initial collections is estimated in reality due to index size you can have as little as 5K collections in your database.

So a collection per user is not feasible at all. It would be using MongoDB against its core principles.

Having a database per user involves the same problems, maybe more, as having singular collections per user.

I have never encountered some one not being able to scale MongoDB to the billions or even close to the 100s of billions (or maybe beyond) on a optimised set-up, however, I do not see why it cannot; after all Facebook is able to make MySQL scale into the 100s of billions per user (across 32K+ shards) for them and the sharding concept is similar between the two databases.

So the theory and possibility of doing this is there. It is all about choosing the right schema and shard concept and key (and severs and network etc etc etc etc).

If you were to witness problems you could go for splitting archive collections, or deleted items away from the main collection but I think that is overkill, instead you want to make sure that MongoDB knows where each segment of your huge dataset is at any given point in time on the master and ensure that this data is always hot, that way queries that don't do a global and scatter OP should be quite fast.

answered Sep 29 '22 05:09

Sammaye

Related questions
                            
                                How to do proper database testing (TDD) on Rails 3 using MongoDB and Mongoid
                            
                                Moving MongoDB's data folder?
                            
                                How to push notifications with angular.js?
                            
                                Multiple schema references in single schema array - mongoose
                            
                                Can't Connect to MongoDB Atlas Cluster Using Mongo Shell
                            
                                Flatten a nested JSON structure in mongoDB
                            
                                Group (By) in Mongoose?
                            
                                Why Use Redis instead of MongoDb for Caching? [closed]
                            
                                mongodb nodejs - converting circular structure
                            
                                mongoose getting `TypeError: user.save is not a function` - what is wrong
                            
                                Why I lose performance if I use LINQ on MongoDB?
                            
                                Transaction support in MongoDB
                            
                                How do the MongoDB journal file and oplog differ?
                            
                                persistent sessions with passport, mongodb and express
                            
                                How to Programmatically Pre-Split a GUID Based Shard Key with MongoDB
                            
                                MEAN Stack File uploads
                            
                                MongoError: The dollar ($) prefixed field '$push' in '$push' is not valid for storage
                            
                                REACT fetch post request
                            
                                MongoDB - file size is huge and growing
                            
                                Finding all records containing a given subfield in mongodb

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MongoDb Database vs Collection

Tags:

mongodb

DafaDil

People also ask

2 Answers

chrisbunney

Sammaye

Recent Activity

Donate For Us