Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb Aggregation Framework: Does $group use index?

I'm trying to use aggregation framework with $match and $group stages. Does $group stage use index data? I'm using latest available mongodb version - 2.5.4

like image 588
fedor.belov Avatar asked Dec 08 '13 16:12

fedor.belov


People also ask

Does aggregation use index?

If the added $match stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. See Improve Performance with Indexes and Document Filters for more information.

How does MongoDB $group work?

The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key. A group key is often a field, or group of fields. The group key can also be the result of an expression.

What field is mandatory in a $Group operation in MongoDB?

_id: This field is mandatory for grouping. If you specify the value of the _id field as null or a constant, the $group operator counts the accumulated values for all input documents as a whole.

What passes through a MongoDB aggregation pipeline?

Each stage of the aggregation pipeline transforms the document as the documents pass through it. However, once an input document passes through a stage, it doesn't necessarily produce one output document. Some stages may generate more than one document as an output. MongoDB provides the db.


2 Answers

$group does not use index data.

From the mongoDB docs:

The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.

The $geoNear pipeline operator takes advantage of a geospatial index. When using $geoNear, the $geoNear pipeline operation must appear as the first stage in an aggregation pipeline.

like image 112
4J41 Avatar answered Sep 29 '22 11:09

4J41


As 4J41's answer says, $group does not (directly) use an index, although $sort does if it is the first stage in the pipeline. However, it seems possible that $group could, in principle, have an optimised implementation if it immediately follows a $sort, in which case you could make it effectively make use of an index by putting a $sort before hand.

There does not seem to be a straight answer either way in the docs about whether $group has this optimisation (although I bet there would be if it did, so this suggests it doesn't). The answer is in MongoDB bug 4507: currently $group does NOT have this implementation, so the top line of 4J41's answer is right after all. If you really need efficiency, depending on the application it may be quickest to use a regular query and do the grouping in your client code.

Edit: As sebastian's answer says, it seems that in practice using $sort (that can take advantage of an index) before a $group can make a very large speed improvement. The bug above is still open so it seems that it is not making the absolute best possible advantage of the index (that is, starting to group items as items are loaded, rather than loading them all in memory first). But it is still certainly worth doing.

like image 40
Arthur Tacca Avatar answered Sep 29 '22 10:09

Arthur Tacca