What is the max size of collection in mongodb

1 Answers

There are theoretical limits, as I will show below, but even the lower bound is pretty high. It is not easy to calculate the limits correctly, but the order of magnitude should be sufficient.

mmapv1

The actual limit depends on a few things like length of shard names and alike (that sums up if you have a couple of hundred thousands of them), but here is a rough calculation with real life data.

Each shard needs some space in the config db, which is limited as any other database to 32TB on a single machine or in a replica set. On the servers I administrate, the average size of an entry in config.shards is 112 bytes. Furthermore, each chunk needs about 250 bytes of metadata information. Let us assume optimal chunk sizes of close to 64MB.

We can have at maximum 500,000 chunks per server. 500,000 * 250byte equals 125MB for the chunk information per shard. So, per shard, we have 125.000112 MB per shard if we max everything out. Dividing 32TB by that value shows us that we can have a maximum of slightly under 256,000 shards in a cluster.

Each shard in turn can hold 32TB worth of data. 256,000 * 32TB is 8.19200 exabytes or 8,192,000 terabytes. That would be the limit for our example.

Let's say its 8 exabytes. As of now, this can easily translated to "Enough for all practical purposes". To give you an impression: All data held by the Library of Congress (arguably one of the biggest library in the world in terms of collection size) holds an estimated size of data of around 20TB in size including audio, video, and digital materials. You could fit that into our theoretical MongoDB cluster some 400,000 times. Note that this is the lower bound of the maximum size, using conservative values.

WiredTiger

Now for the good part: The WiredTiger storage engine does not have this limitation: The database size is not limited (since there is no limit on how many datafiles can be used), so we can have an unlimited number of shards. Even when we have those shards running on mmapv1 and only our config servers on WT, the size of a becomes nearly unlimited – the limitation to 16.8M TB of RAM on a 64 bit system might cause problems somewhere and cause the indices of the config.shard collection to be swapped to disk, stalling the system. I can only guess, since my calculator refuses to work with numbers in that area (and I am too lazy to do it by hand), but I estimate the limit here in the two digit yottabyte area (and the space needed to host that somewhere in the size of Texas).

Conclusion

Do not worry about the maximum data size in a sharded environment. No matter what, it is by far enough, even with the most conservative approach. Use sharding, and you are done. Btw: even 32TB is a hell lot of data: Most clusters I know hold less data and shard because the IOPS and RAM utilization exceeded a single nodes capacity.

131

answered Oct 11 '22 23:10

Markus W Mahlberg

Related questions
                            
                                Why does mongoose model's hasOwnProperty return false when property does exist?
                            
                                SocketTimeout with opened connection in MongoDB
                            
                                Mongo C# ignore property
                            
                                MongoDB: insert on duplicate key update
                            
                                Increment a value in a nested object?
                            
                                connecting to remote mongo server results in exception connect failed
                            
                                classes and interfaces to write typed Models and schemas of Mongoose in Typescript using definitelytyped
                            
                                What is the difference between readPreference and readConcern in MongoDB?
                            
                                MongoDB get executionStats for aggregate query
                            
                                install not complete mongodb on windows 7 64 bit
                            
                                Has anyone used an object database with a large amount of data?
                            
                                Mongoose - Same schema for different collections in (MongoDB)
                            
                                node.js & express - global modules & best practices for application structure
                            
                                Select column from Mongodb in golang using mgo
                            
                                MongoDB lists - get every Nth item
                            
                                How to marshal json string to bson document for writing to MongoDB?
                            
                                Error on MongoDB Authentication
                            
                                How do I insert a record from one mongo database into another?
                            
                                How to use $push update modifier in MongoDB and C#, when updating an array in a document
                            
                                Query with filter builder on nested array using MongoDB C# driver

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the max size of collection in mongodb

Tags:

mongodb

mongoose

Aravind Kumar Anugula

People also ask