The MongoDB Application FAQ mentions that short field names are a technique that can be used for small documents. This led me to thinking, "what's a small document anyway?"
I'm using pymongo, is there any way I can write some python to scan a collection, and get a feel of the ratio of bytes used for field descriptors vs bytes used for actual field data?
I'm tangentially curious on what the basic byte overhead is per doc, as well.
stats() method is used to return statistics about the collection. The scale used in the output to display the sizes of items. By default, output displays size in bytes. To display kilobytes rather than bytes, specify a scale value of 1024.
collection. totalSize() method is used to reports the total size of a collection, including the size of all documents and all indexes on a collection. Returns: The total size in bytes of the data in the collection plus the size of every index on the collection.
The db. stats() method is used to return a document that reports on the state of the current database. The scale at which to deliver results. Unless specified, this command returns all data in bytes.
There is no builtin way to get the ratio of space used for keys in BSON documents versus space used for actual field values. However, the collstats and dbstats commands can give you useful information on collection and database size. Here's how to use them in pymongo:
from pymongo import MongoClient
client = MongoClient()
db = client.test
# print collection statistics
print db.command("collstats", "events")
# print database statistics
print db.command("dbstats")
You could always hack something up to get a pretty good estimate though. If all of your documents in a collection have the same schema, then something like this isn't half bad:
Now d is the proportion of the total data size of the collection which is used to store field names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With