In short: If you have a large number of documents with varying sizes, where relatively few documents hit the maximum object size, what are the best practices to store those documents in MongoDB?
I have set of documents like:
{_id: ...,
values: [12, 13, 434, 5555 ...]
}
The length of the values list varies hugely from one document to another. For the majority of documents, it will have a few elements, for a few it will have tens of millions of elements, and I will hit the maximum object size limit in MongoDB. The trouble is any special solution I come up with for those very large (and relatively few) documents might have an impact on how I store the small documents which would, otherwise, live happily in a MongoDB collection.
As far as I see, I have the following options. I would appreciate any input on pros and cons of those, and any other option that I missed.
1) Use another datastore: That seems too drastic. I like MongoDB, and it's not like I hit the size limit for many objects. In the words case, my application could treat the very large objects and the rest differently. It just doesn't seem elegant.
2) Use GridFS to store the values: Like a blob in a traditional DB, I could keep the first few thousand elements of values in document and if there are more elements in the list, I could keep the rest in a GridFS object as a binary file. I wouldn't be able to search in this part, but I can live with that.
3) Abuse GridFS: I could keep every document in gridFS. For the majority of the (small) documents the binary chunk would be empty because the files collection would be able to keep everything. For the rest I could keep the excess elements in the chunks collection. Does that introduce an overhead compared to option #2?
4) Really abuse GridFS: I could use the optional fields in the files collection of GridFS to store all elements in the values. Does GridFS do smart chunking also for the files collection?
5) Use an additional "relational" collection to store the one-to-many relation, but th number of documents in this collection would easily exceed a hundred billion rows.
In MongoDB, use GridFS for storing files larger than 16 MB. In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem. If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
The maximum size an individual document can be in MongoDB is 16MB with a nested depth of 100 levels. Edit: There is no max size for an individual MongoDB database.
MongoDB: MongoDB is a NoSQL database. It has a flexible schema. MongoDB stores huge amounts of data in a naturally traversable format, making it a good choice to store, query, and analyze big data.
MongoDB requires approximately 1 GB of RAM per 100.000 assets. If the system has to start swapping memory to disk, this will have a severely negative impact on performance and should be avoided.
If you have large documents, try to store some metadata about them in MongoDB, and put the rest of the data --the part you will not be querying on-- outside.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With