Understanding performance: mongo aggregation vs count

Q: Is MongoDB aggregation fast?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.

Q: Is count faster than find MongoDB?

find({}). count() more fast then collection.

Q: Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection. The aggregation pipeline can use indexes to improve its performance during some of its stages.

Q: Is aggregation good in MongoDB?

MongoDB Aggregation goes further though and can also perform relational-like joins, reshape documents, create new and update existing collections, and so on. While there are other methods of obtaining aggregate data in MongoDB, the aggregation framework is the recommended approach for most work.

Tags:

mongodb

mongodb-query

nosql

aggregation-framework

If I do a count query, I get the results in <2seconds

db.coll.find({"A":1,"createDate":{"$gt":new Date("2011-05-21"),"$lt":new Date("2013-08-21")}}).count()

This uses the following index

db.coll.ensureIndex({"A":1,"createDate":1})

Similarly, there are 4 columns A,B,C,D(values are always 0 or 1) for which I run 4 count queries and get results in <10seconds.

I looked at the aggregation framework documentation and created an aggregated query to do all 4 sums together.

db.coll.aggregate(  { $match : {"createDate":{$gt:new Date("2013-05-21"),$lt:new Date("2013-08-21")} } },
{ $group :
                         { _id:null,
                         totalA : { $sum : "$A" },
                         totalB : {$sum: "$B},
                         totalC:{$sum: "$C"},
                         totalD:{$sum: "$D"}}} 
 )

I also created an index:

db.coll..ensureIndex({"createDate":1,"A":1,"B":1,"C":1,"D":1})

According to the documentation, this index covers my aggregate function. But the return of the aggregate is in ~18seconds.

I'm confused here. Is there anything basic which I missed or is there any fundamental concept lying behind which makes aggregation slower than count. I am also concerned about the overhead due to number of queries to be fired from mongo from the code for fetching count.

600

asked Feb 21 '14 06:02

crazydiv

1 Answers

Firstly, though not documented for 2.4.8 you can run an explain using the db.runCommand invocation:

db.runCommand({
    aggregate: "coll",
    pipeline: [      
        { $match : 
            {"createDate":{$gt:new Date("2013-05-21"),$lt:new Date("2013-08-21")} } 
        },
        { $group : { 
              _id:null,
              totalA: {$sum :"$A"},
              totalB: {$sum: "$B"},
              totalC: {$sum: "$C"},
              totalD: {$sum: "$D"}
        }} 
    ],
    explain: true
})

Which will give you some insight into what is happening.

Also, and primarily, you are comparing apples to oranges.

When you issue a count() on a query, it is using the cursor result properties to get the number of documents that matched.

Under aggregation, you are selecting an extended match and then compacting all of those results into a sum of all the items. If your initial $match has lots of results, then all of these need to be crunched together with $sum.

Have a look at explain, and try to conceptually understand the differences. Aggregation is great for what you generally want it to do. But maybe this isn't the best use case.

155

answered Oct 05 '22 05:10

Neil Lunn

Related questions
                            
                                'SQL 'like' statement in mongodb [duplicate]
                            
                                Running advanced MongoDB queries in R with rmongodb
                            
                                Mongoose - Go to next element
                            
                                mongoid unique index allowed duplicate values
                            
                                mongo --shell file.js and "use" statement
                            
                                Why gridfs get isn't working on file id (ObjectId) only by filename
                            
                                Mongodb aggregation framework project with conditions
                            
                                Why use model.export in separate model files?
                            
                                MongoDB relational query
                            
                                how to import csv file in mongodb
                            
                                Pluck multiple and/or nested fields on mongoid
                            
                                PyMongo $inc having issues
                            
                                installing MongoDB java driver
                            
                                NodeJS/ExpressJS send response of large amount of data in 1 stream
                            
                                Aggregate MongoDB results by ObjectId date
                            
                                Optional parameters for MongoDB query
                            
                                insert in subdocument with mongoDB
                            
                                mongoexport not working if you add query of selection
                            
                                Meteor.js Collection empty on Client
                            
                                Storing TimeZone and Currency for a User

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With