Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongo count really slow when there are millions of records

Tags:

mongodb

//FAST db.datasources.find().count() 12036788  //SLOW     db.datasources.find({nid:19882}).count() 10161684 

Index on nid

Any way to make the second query faster? (It is taking about 8 seconds)

like image 526
Chris Muench Avatar asked Mar 19 '12 21:03

Chris Muench


People also ask

Can MongoDB handle millions of records?

Working with MongoDB and ElasticSearch is an accurate decision to process millions of records in real-time. These structures and concepts could be applied to larger datasets and will work extremely well too.

Is MongoDB count slow?

Count queries, indexed or otherwise, are slow due to the fact that MongoDB still has to do a full b-tree walk to find the appropriate number of documents that match your criteria.

How many records can MongoDB handle?

Mongo can easily handle billions of documents and can have billions of documents in the one collection but remember that the maximum document size is 16mb. There are many folk with billions of documents in MongoDB and there's lots of discussions about it on the MongoDB Google User Group.


1 Answers

Count queries, indexed or otherwise, are slow due to the fact that MongoDB still has to do a full b-tree walk to find the appropriate number of documents that match your criteria. The reason for this is that the MongoDB b-tree structure is not "counted" meaning each node does not store information about the amount of elements in the node/subtree.

The issue is reported here https://jira.mongodb.org/browse/SERVER-1752 and there is currently no workaround to improve performance other than manually maintaining a counter for that collection which obviously comes with a few downsides.

Also note that the db.col.count() version (so no criteria) can take a big shortcut and doesn't actually perform a query hence it's speed. That said it does not always report the same value as a count query would that should return all elements (it won't be in sharded environments with high write throughput for example). Up for debate whether or not that's a bug. I think it is.

Note that in 2.3+ a significant optimization was introduced that should (and does) improve performance of counts on indexed fields. See : https://jira.mongodb.org/browse/SERVER-7745

like image 76
Remon van Vliet Avatar answered Oct 04 '22 01:10

Remon van Vliet