Why are key names stored in the document in MongodDB

Tags:

mongodb

I'm curious about this quote from Kyle Banker's MongoDB In Action:

It’s important to consider the length of the key names you choose, since key names are stored in the documents themselves. This contrasts with an RDBMS, where column names are always kept separate from the rows they refer to. So when using BSON, if you can live with dob in place of date_of_birth as a key name, you’ll save 10 bytes per document. That may not sound like much, but once you have a billion such documents, you’ll have saved nearly 10 GB of storage space just by using a shorter key name. This doesn’t mean you should go to unreasonable lengths to ensure small key names; be sensible. But if you expect massive amounts of data, economizing on key names will save space.

I am interested in the reason why this is not optimized on the database server side. Would a in-memory lookup table with all key names in the collection be too much of a performance penalty that is not worth the potential space savings?

895

asked Jul 11 '12 09:07

c089

1 Answers

What you are referring to is often called "key compression"*. There are several reasons why it hasn't been implemented:

If you want it done, you can currently do it at the Application/ORM/ODM level quite easily.
It's not necessarily a performance** advantage in all cases — think collections with lots of key names, and/or key names that vary wildly between documents.
It might not provide a measurable performance** advantage at all until you have millions of documents.
If the server does it, the full key names still have to be transmitted over the network.
If compressed key names are transmitted over the network, then readability really suffers using the javascript console.
Compressing the entire JSON document ~~might offer~~ offers an even better performance advantage.

Like all features, there's a cost benefit analysis for implementing it, and (at least so far) other features have offered more "bang for the buck".

Full document compression is ~~[being considered][1] for a future MongoDB version.~~ available as of version 3.0 (see below)

* An in-memory lookup table for key names is basically a special case of LZW style compression — that's more or less what most compression algorithms do.

** Compression provides both a space advantage and a performance advantage. Smaller documents means that more documents can be read per IO, which means that in a system with fixed IO, more documents per second can be read.

Update

MongoDB versions 3.0 and up now have full document compression capability with the WiredTiger storage engine.

Two compression algorithms are available: snappy, and zlib. The intent is for snappy to be the best choice for all-around performance, and for zlib to be the best choice for maximum storage capacity.

In my personal (non-scientific, but related to a commercial project) experimentation, snappy compression (we didn't evaluate zlib) offered significantly improved storage density at no noticeable net performance cost. In fact, there was slightly better performance in some cases, roughly in line with my previous comments/predictions.

answered Sep 21 '22 00:09

Sean Reilly

Related questions
                            
                                MongoDB - Different query execution times after restarting server
                            
                                Encoding::UndefinedConversionError ("\xE2" from ASCII-8BIT to UTF-8): error in ROR + MongoDB based app
                            
                                node.js: Return from function not acting as expected
                            
                                Can individual fields be set to expire in MongoDB?
                            
                                Can meteor mongo driver handle $each and $position operators?
                            
                                How to convert string to objectId in LocalField for $lookup Mongodb [duplicate]
                            
                                Need advice on MongoDB Schema for Chat App. Embedded vs Related Documents
                            
                                Synchronize Data across multiple occasionally-connected-clients using EventSourcing (NodeJS, MongoDB, JSON)
                            
                                mongoose find a document by reference property
                            
                                .save() is not a Function Mongoose
                            
                                Writing a Mongo Converter for only one field
                            
                                Ordering a result set randomly in mongo
                            
                                Calculating needed memory for n connection pools for mongodb running on node.js app
                            
                                Python Web Framework with best Mongo support
                            
                                Is it true that MongoDB has one global read/write lock? [closed]
                            
                                How to sort results by string length on MongoDB
                            
                                What will happen if document size more than 16mb in MongoDB?
                            
                                Mongodb - poor performance when no results return
                            
                                MongoDB sort with a custom expression or function
                            
                                What is the archive format used by mongodump with "--archive" key? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With