Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate framework can't use indexes

I run this command:

db.ads_view.aggregate({$group: {_id : "$campaign", "action" : {$sum: 1} }});

ads_view : 500 000 documents.

this queries take 1.8s . this is its profile : https://gist.github.com/afecec63a994f8f7fd8a

indexed : db.ads_view.ensureIndex({campaign: 1});

But mongodb don't use index. Anyone know if can aggregate framework use indexes, how to index this query.

like image 790
meotimdihia Avatar asked Nov 20 '12 03:11

meotimdihia


People also ask

Does aggregate use index?

Indexes can cover queries in an aggregation pipeline. A covered query uses an index to return all of the documents and has high performance.

Is MongoDB good for aggregation?

As with many other database systems, MongoDB allows you to perform a variety of aggregation operations. These allow you to process data records in a variety of ways, such as grouping data, sorting data into a specific order, or restructuring returned documents, as well as filtering data as one might with a query.

Is MongoDB aggregation fast?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.

What is aggregation framework in MongoDB?

Aggregation in MongoDB allows for the transforming of data and results in a more powerful fashion than from using the find() command. Through the use of multiple stages and expressions, you are able to build a "pipeline" of operations on your data to perform analytic operations.


2 Answers

This is a late answer, but since $group in Mongo as of version 4.0 still won't make use of indexes, it may be helpful for others.

To speed up your aggregation significantly, performe a $sort before $group.

So your query would become:

db.ads_view.aggregate({$sort:{"campaign":1}},{$group: {_id : "$campaign", "action" : {$sum: 1} }});

This assumes an index on campaign, which should have been created according to your question. In Mongo 4.0, create the index with db.ads_view.createIndex({campaign:1}).

I tested this on a collection containing 5.5+ Mio. documents. Without $sort, the aggregation would not have finished even after several hours; with $sort preceeding $group, aggregation is taking a couple of seconds.

like image 183
sebastian Avatar answered Oct 17 '22 19:10

sebastian


The $group operator is not one of the ones that will use an index currently. The list of operators that do (as of 2.2) are:

$match
$sort
$limit
$skip

From here:

http://docs.mongodb.org/manual/applications/aggregation/#pipeline-operators-and-indexes

Based on the number of yields going on in the gist, I would assume you either have a very active instance or that a lot of this data is not in memory when you are doing the group (it will yield on page fault usually too), hence the 1.8s

Note that even if $group could use an index, and your index covered everything being grouped, it would still involve a full scan of the index to do the group, and would likely not be terrible fast anyway.

like image 28
Adam Comerford Avatar answered Oct 17 '22 19:10

Adam Comerford