Why is PyMongo count_documents is slower than count?

2 Answers

This is not about PyMongo but Mongo itself.

count is a native Mongo function. It doesn't really count all the documents. Whenever you insert or delete a record in Mongo, it caches the total number of records in the collection. Then when you run count, Mongo will return that cached value.

count_documents uses a query object, which means that it has to loop through all the records in order to get the total count. Because you're not passing any parameters, it will have to run over all 60 million records. This is why it is slow.

based on @Stennie comment

You can use estimated_document_count() in PyMongo 3.7+ to return the fast count based on collection metadata. The original count() was deprecated because the behaviour differed (estimated vs actual count) based on whether query criteria was provided. The newer driver API is more intentional about the outcome

answered Sep 30 '22 19:09

Amit Wagner

As already mentioned here, the behavior is not specific to PyMongo.

The reason is because the count_documents method in PyMongo performs an aggregation query and does not use any metadata. see collection.py#L1670-L1688

pipeline = [{'$match': filter}]
if 'skip' in kwargs:
    pipeline.append({'$skip': kwargs.pop('skip')})
if 'limit' in kwargs:
    pipeline.append({'$limit': kwargs.pop('limit')})
pipeline.append({'$group': {'_id': None, 'n': {'$sum': 1}}})
cmd = SON([('aggregate', self.__name),
           ('pipeline', pipeline),
           ('cursor', {})])
if "hint" in kwargs and not isinstance(kwargs["hint"], string_type):
    kwargs["hint"] = helpers._index_document(kwargs["hint"])
collation = validate_collation_or_none(kwargs.pop('collation', None))
cmd.update(kwargs)
with self._socket_for_reads(session) as (sock_info, slave_ok):
    result = self._aggregate_one_result(
        sock_info, slave_ok, cmd, collation, session)
if not result:
    return 0
return result['n']

This command has the same behavior as the collection.countDocuments method.

That being said, if you willing to trade accuracy for performance, you can use the estimated_document_count method which on the other hand, send a count command to the database with the same behavior as collection.estimatedDocumentCount See collection.py#L1609-L1614

if 'session' in kwargs:
    raise ConfigurationError(
        'estimated_document_count does not support sessions')
    cmd = SON([('count', self.__name)])
    cmd.update(kwargs)
    return self._count(cmd)

Where self._count is a helper sending the command.

answered Sep 30 '22 17:09

styvane

Related questions
                            
                                MongoDB: update dictionary in document
                            
                                C# MongoDB.Driver GetServer is Gone, What Now?
                            
                                Model.findOne not returning docs but returning a wrapper object
                            
                                How to use $in or $nin in mongo aggregation $group $cond
                            
                                Mongo Map Reduce first time
                            
                                MongoDB - simulate join or subquery
                            
                                Can I retrieve multiple docs from Mongo by id?
                            
                                MongoDB performance issue: Single Huge collection vs Multiple Small Collections
                            
                                Ordering fields from find query with projection
                            
                                How to optimize MongoDB query with both $gt and $lte?
                            
                                MongoDB: WriteResult.getN() always returns 0?
                            
                                How can I use a $elemMatch on first level array?
                            
                                MongoDB - Querying between a time range of hours
                            
                                MongoRepository inheritance serialization error
                            
                                find in MongoCollection<Document>
                            
                                Error while mongorestore - assertion: 17370 Restoring users and roles is only supported for clusters with auth schema versions 1 or 3, found: 5
                            
                                mongod --bind_ip using docker-compose version 2
                            
                                Find by id with mgo
                            
                                How can I join two collections spring-data-mongdb as ManyToMany (RDBMS)
                            
                                MongoDb Aggregation - project values as keys and corresponding array value as values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is PyMongo count_documents is slower than count?

Tags:

mongodb

mongodb-query

pymongo

pymongo-3.x

Threegirl

People also ask

2 Answers

Amit Wagner

styvane

Recent Activity

Donate For Us