I have a MongoDB collection of over 1,000,000 records. Each record size is around 20K (so the total collection size is around 20GB).
I have a 'type' field in the collection (that can have around 10 different values). I would like to get the per-type counters for the collection. Also, there is an index on the 'type' field.
I've tested two different approaches (assume python syntax):
for type_val in my_db.my_colc.distinct('type'):
counters[type_val] = my_db.my_colc.find({'type' : type_val}).count()
counters = my_db.my_colc.aggregate([{'$group' : {'_id': '$type', 'agg_val': { '$sum': 1 } }}])
The performance I'm receiving for the first approach is about 2 orders of magnitude faster than for the 2nd approach. Seems to be related to the fact that count runs on the indices only, without accessing the documents, while $group has to go over the documents one-by-one. (It's about 1min vs. 45mins).
Is there any way to run an efficient grouping query on the 'type' index, that would use only the index, thus achieving the performance results from #1, but using the aggregation framework ?
I am using MongoDB 2.6.1
Update: https://jira.mongodb.org/browse/SERVER-11447 is open on this issue in MongoDB Jira.
in aggregation pipeline the $group clause doesn't use indexes. It is supposed to be used after a $match, which indeed can use indexes to speed it up.
http://docs.mongodb.org/manual/core/aggregation-pipeline/#aggregation-pipeline-operators-and-performance
cheers,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With