pymongo method of getting statistics for collection byte usage?

1 Answers

There is no builtin way to get the ratio of space used for keys in BSON documents versus space used for actual field values. However, the collstats and dbstats commands can give you useful information on collection and database size. Here's how to use them in pymongo:

from pymongo import MongoClient

client = MongoClient()
db = client.test

# print collection statistics
print db.command("collstats", "events")

# print database statistics
print db.command("dbstats")

You could always hack something up to get a pretty good estimate though. If all of your documents in a collection have the same schema, then something like this isn't half bad:

Count up the total number of characters in the field names of a document, and call this number a.
Add one to a for each field in order to account for the terminating character. Let the result be b.
Multiply b by the number of documents in the collection, and let the result be denoted by c.
Divide c by the "size" field returned by collStats (assuming collStats is scaled to return size in bytes). Let this value be d.

Now d is the proportion of the total data size of the collection which is used to store field names.

answered Sep 30 '22 19:09

david.storch

Related questions
                            
                                How check if a task is already in python Queue?
                            
                                Race-condition creating folder in Python
                            
                                Python multiprocessing process vs. standalone Python VM
                            
                                Is there a multithreaded map() function? [closed]
                            
                                Subsetting data in Python
                            
                                python 3: how to check if an object is a function? [duplicate]
                            
                                Can a python program be run on a computer without Python? What about C/C++?
                            
                                How to use pipe in IPython
                            
                                Jinja2 ignore UndefinedErrors for objects that aren't found
                            
                                How to monkey patch Django?
                            
                                django querysets + memcached: best practices
                            
                                slices to immutable strings by reference and not copy
                            
                                UUID field added after data already in database. Is there any way to populate the UUID field for existing data?
                            
                                Python Opencv SolvePnP yields wrong translation vector
                            
                                Why are uncompiled, repeatedly used regexes so much slower in Python 3?
                            
                                Find closest row of DataFrame to given time in Pandas
                            
                                web scraping google news with python
                            
                                How to disable cookie handling with the Python requests library?
                            
                                Using Python to Remove All Lines Matching Regex
                            
                                pandas group by year, rank by sales column, in a dataframe with duplicate data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pymongo method of getting statistics for collection byte usage?

Tags:

python

python-3.x

mongodb

pymongo

Travis Griggs

People also ask

1 Answers

david.storch

Recent Activity

Donate For Us