This sounds odd, and I hope I am doing something wrong, but my MongoDB
collection is returning the Count
off by one in my collection.
I have a collection with (I am sure) 359671 documents. However the count()
command returns 359670 documents.
I am executing the count()
command using the mongo shell:
rs0:PRIMARY> db.COLLECTION.count()
359670
This is incorrect.
It is not finding each and every document in my collection.
If I provide the following query to count, I get the correct result:
rs0:PRIMARY> db.COLLECTION.count({_id: {$exists: true}})
359671
I believe this is a bug in WiredTiger. As far as I am aware each document has the same definition, an _id field of an integer ranging from 0 to 359670, and a BinData field. I did not have this problem with the older storage engine (or Mongo 2, either could have caused the issue).
Is this something I have done wrong? I do not want to use the {_id: {$exists: true}}
query as that takes 100x longer to complete.
count() method is used to return the count of documents that would match a find() query. The db. collection. count() method does not perform the find() operation but instead counts and returns the number of results that match a query.
The count() method counts the number of documents that match the selection criteria. It returns the number of documents that match the selection criteria. It takes two arguments first one is the selection criteria and the other is optional.
Method 2: count_documents() Alternatively, you can also use count_documents() function in pymongo to count the number of documents present in the collection. Example: Retrieves the documents present in the collection and the count of the documents using count_documents().
The Atlas Search count option adds a field to the metadata results document that displays a count of the search results for the query. You can use count to determine the size of the result set. You can use it in the $search or $searchMeta stage.
As now stated in the doc, db.collection.count()
without using a query parameter, returns results based on the collection’s metadata:
This may result in an approximate count. In particular:
On a sharded cluster, the resulting count will not correctly filter out orphaned documents.
After an unclean shutdown, the count may be incorrect.
When using a query parameter, as you did in the second query ({_id: {$exists: true}}
), then it forces count
to not use the collection's metadata, but to scan the collection instead.
Starting Mongo 4.0.3
, count()
is considered deprecated and the following alternatives are recommended instead:
db.collection.countDocuments({})
which under the hood actually performs the following "expensive", but accurate aggregation (expensive since the whole collection is scanned to count records):
db.collection.aggregate([{ $group: { _id: null, n: { $sum: 1 } } }])
db.collection.estimatedDocumentCount()
which performs exactly what db.collection.count()
does/did (it's actually a wrapper around count
), which uses the collection’s metadata.
This is thus almost instantaneous, but may lead to an approximate result in the particular cases mentioned above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With