Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding MongoDB BSON Document size limit

Tags:

mongodb

bson

People also ask

What is the maximum size of document in MongoDB?

The maximum size an individual document can be in MongoDB is 16MB with a nested depth of 100 levels. Edit: There is no max size for an individual MongoDB database.

How does MongoDB define maximum number of documents in collection?

Unless you create a capped collection, there is no limit to the number of documents in a collection. There is a 16MB limit to the size of a document (use gridfs in this situation) and limits in the storage engine for the size of the database and data.

Can MongoDB handle billions of documents?

Mongo can easily handle billions of documents and can have billions of documents in the one collection but remember that the maximum document size is 16mb. There are many folk with billions of documents in MongoDB and there's lots of discussions about it on the MongoDB Google User Group.


First off, this actually is being raised in the next version to 8MB or 16MB ... but I think to put this into perspective, Eliot from 10gen (who developed MongoDB) puts it best:

EDIT: The size has been officially 'raised' to 16MB

So, on your blog example, 4MB is actually a whole lot.. For example, the full uncompresses text of "War of the Worlds" is only 364k (html): http://www.gutenberg.org/etext/36

If your blog post is that long with that many comments, I for one am not going to read it :)

For trackbacks, if you dedicated 1MB to them, you could easily have more than 10k (probably closer to 20k)

So except for truly bizarre situations, it'll work great. And in the exception case or spam, I really don't think you'd want a 20mb object anyway. I think capping trackbacks as 15k or so makes a lot of sense no matter what for performance. Or at least special casing if it ever happens.

-Eliot

I think you'd be pretty hard pressed to reach the limit ... and over time, if you upgrade ... you'll have to worry less and less.

The main point of the limit is so you don't use up all the RAM on your server (as you need to load all MBs of the document into RAM when you query it.)

So the limit is some % of normal usable RAM on a common system ... which will keep growing year on year.

Note on Storing Files in MongoDB

If you need to store documents (or files) larger than 16MB you can use the GridFS API which will automatically break up the data into segments and stream them back to you (thus avoiding the issue with size limits/RAM.)

Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document.

GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

You can use this method to store images, files, videos, etc in the database much as you might in a SQL database. I have used this to even store multi gigabyte video files.


Many in the community would prefer no limit with warnings about performance, see this comment for a well reasoned argument: https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283

My take, the lead developers are stubborn about this issue because they decided it was an important "feature" early on. They're not going to change it anytime soon because their feelings are hurt that anyone questioned it. Another example of personality and politics detracting from a product in open source communities but this is not really a crippling issue.


To post a clarification answer here for those who get directed here by Google.

The document size includes everything in the document including the subdocuments, nested objects etc.

So a document of:

{
  "_id": {},
  "na": [1, 2, 3],
  "naa": [
    { "w": 1, "v": 2, "b": [1, 2, 3] },
    { "w": 5, "b": 2, "h": [{ "d": 5, "g": 7 }, {}] }
  ]
}

Has a maximum size of 16 MB.

Subdocuments and nested objects are all counted towards the size of the document.


I have not yet seen a problem with the limit that did not involve large files stored within the document itself. There are already a variety of databases which are very efficient at storing/retrieving large files; they are called operating systems. The database exists as a layer over the operating system. If you are using a NoSQL solution for performance reasons, why would you want to add additional processing overhead to the access of your data by putting the DB layer between your application and your data?

JSON is a text format. So, if you are accessing your data through JSON, this is especially true if you have binary files because they have to be encoded in uuencode, hexadecimal, or Base 64. The conversion path might look like

binary file <> JSON (encoded) <> BSON (encoded)

It would be more efficient to put the path (URL) to the data file in your document and keep the data itself in binary.

If you really want to keep these files of unknown length in your DB, then you would probably be better off putting these in GridFS and not risking killing your concurrency when the large files are accessed.