Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding performance: mongo aggregation vs count

If I do a count query, I get the results in <2seconds

db.coll.find({"A":1,"createDate":{"$gt":new Date("2011-05-21"),"$lt":new Date("2013-08-21")}}).count()

This uses the following index

db.coll.ensureIndex({"A":1,"createDate":1})

Similarly, there are 4 columns A,B,C,D(values are always 0 or 1) for which I run 4 count queries and get results in <10seconds.

I looked at the aggregation framework documentation and created an aggregated query to do all 4 sums together.

db.coll.aggregate(  { $match : {"createDate":{$gt:new Date("2013-05-21"),$lt:new Date("2013-08-21")} } },
{ $group :
                         { _id:null,
                         totalA : { $sum : "$A" },
                         totalB : {$sum: "$B},
                         totalC:{$sum: "$C"},
                         totalD:{$sum: "$D"}}} 
 ) 

I also created an index:

db.coll..ensureIndex({"createDate":1,"A":1,"B":1,"C":1,"D":1})

According to the documentation, this index covers my aggregate function. But the return of the aggregate is in ~18seconds.

I'm confused here. Is there anything basic which I missed or is there any fundamental concept lying behind which makes aggregation slower than count. I am also concerned about the overhead due to number of queries to be fired from mongo from the code for fetching count.

like image 600
crazydiv Avatar asked Feb 21 '14 06:02

crazydiv


People also ask

Is MongoDB aggregation fast?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.

Is count faster than find MongoDB?

find({}). count() more fast then collection.

Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection. The aggregation pipeline can use indexes to improve its performance during some of its stages.

Is aggregation good in MongoDB?

MongoDB Aggregation goes further though and can also perform relational-like joins, reshape documents, create new and update existing collections, and so on. While there are other methods of obtaining aggregate data in MongoDB, the aggregation framework is the recommended approach for most work.


1 Answers

Firstly, though not documented for 2.4.8 you can run an explain using the db.runCommand invocation:

db.runCommand({
    aggregate: "coll",
    pipeline: [      
        { $match : 
            {"createDate":{$gt:new Date("2013-05-21"),$lt:new Date("2013-08-21")} } 
        },
        { $group : { 
              _id:null,
              totalA: {$sum :"$A"},
              totalB: {$sum: "$B"},
              totalC: {$sum: "$C"},
              totalD: {$sum: "$D"}
        }} 
    ],
    explain: true
})

Which will give you some insight into what is happening.

Also, and primarily, you are comparing apples to oranges.

When you issue a count() on a query, it is using the cursor result properties to get the number of documents that matched.

Under aggregation, you are selecting an extended match and then compacting all of those results into a sum of all the items. If your initial $match has lots of results, then all of these need to be crunched together with $sum.

Have a look at explain, and try to conceptually understand the differences. Aggregation is great for what you generally want it to do. But maybe this isn't the best use case.

like image 155
Neil Lunn Avatar answered Oct 05 '22 05:10

Neil Lunn