Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tips to save space in mongodb

Various mongodb services meters by disk use. What are some tips for saving space when working with mongodb?

Thanks.

like image 514
Mark Avatar asked Nov 26 '10 15:11

Mark


People also ask

How can I store large files in MongoDB?

In MongoDB, use GridFS for storing files larger than 16 MB. In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem. If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.

How much space does MongoDB take?

MongoDB requires approximately 1 GB of RAM per 100.000 assets. If the system has to start swapping memory to disk, this will have a severely negative impact on performance and should be avoided.

How do you store data in MongoDB?

MongoDB stores the data on the disk as BSON in your data path directory, which is usually /data/db. There should be two files per collection there, collection. 0, which stores the data (and that integer is then incremented as needs be) and collection. ns which stores the namespacing metadata for the collection.


1 Answers

This question is really rather vague. Some things which may or may not apply to you (in no particular order):

Shorten verbose field names

This is best illustrated with an example:

{
    surname: "Smith",
    forename: "John",
    location: { grid_e: 100.02, grid_n: 450.08 }
}

The previous document could be shortened by removing unnecessary wordiness in the various field names.

{
    sn: "Smith",
    fn: "John",
    loc: { e: 100.02, n: 450.08 }
}

This will give you a very tiny saving in space, but it will be multiplied by the size of each document (number of fields) and the number of documents (could become significant if you have millions). Here is a superb post discussing the benefits and drawbacks of this method.

Capped Collections

Capped collections allow you to specify a limit to how many documents you wish to store. It works in a first-in-first-out manner (oldest documents will be discarded). This is particularly applicable if you are logging and wish to store the most recent x documents, but old ones have no relevance.

There are some caveats to the use of capped collections. See the MongoDB docs for full details.

Consider your documents' relationships

Documents can either have embedded documents or relationships to other documents (in other collections) foreign-key style. The pros and cons of each approach are discussed frequently, but ultimately it is for you to choose which approach works for you.

Taking the example of a blog, it may be that each blog post has an author. You could either embed this author information within each post, or you might choose to put them in their own authors or users collection. The latter approach would save space, particularly if many users often make many posts (rather than just one or two). Be aware that you will incur an extra database call since there are no joins.

Edit: Expanding on Relationships

Relationships between documents can be done in a couple of ways in addition to embedding them. You could just use the ID of the related document like so (reusing the blog example above):

{
    _id: <whatever>,
    title: "Document Relationships in MongoDB",
    body: "bla bla bla bla",
    // ...
    user_id: <id of the user document>
}

And in the users collection, that related document would exist:

{
    _id: <whatever>,
    name: "Mark Embling",
    email: "[email protected]",
    ///...
}

This is probably the simplest possible approach to relationships (besides embedding them), but it will be up to you to maintain it within your own code entirely. You will need to make the call to grab the related user when you need it, and to update it whenever that might be necessary. That said, I see nothing wrong with this approach, and have seen it used on a few occasions.

A similar approach is to use DBRef. This is a more formal method for describing a relationship like the above. Instead of just putting the ID of the other document in, you specify a DBRef which is a sort of reference to another document, formalized. I hope that makes sense. Both approaches I have described here are discussed in detail in the mongodb docs. It is worth noting that manual references will take up (slightly) less space than a DBRef, since a DBRef holds extra (possibly redundant) information, such as which collection is referred to. It has the advantage of being supported natively by many of the driver libs though, so it makes your life that little bit easier.

Ultimately, what methods work and are relevant depend on what it is you are trying to do. Consider the options, the tradeoff and make the call as to whether its something you should do. And experiment.

like image 136
Mark Embling Avatar answered Nov 15 '22 22:11

Mark Embling