I'm trying to use aggregation framework with $match
and $group
stages. Does $group
stage use index data? I'm using latest available mongodb version - 2.5.4
If the added $match stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. See Improve Performance with Indexes and Document Filters for more information.
The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key. A group key is often a field, or group of fields. The group key can also be the result of an expression.
_id: This field is mandatory for grouping. If you specify the value of the _id field as null or a constant, the $group operator counts the accumulated values for all input documents as a whole.
Each stage of the aggregation pipeline transforms the document as the documents pass through it. However, once an input document passes through a stage, it doesn't necessarily produce one output document. Some stages may generate more than one document as an output. MongoDB provides the db.
$group
does not use index data.
From the mongoDB docs:
The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.
The $geoNear pipeline operator takes advantage of a geospatial index. When using $geoNear, the $geoNear pipeline operation must appear as the first stage in an aggregation pipeline.
As 4J41's answer says, $group
does not (directly) use an index, although $sort
does if it is the first stage in the pipeline. However, it seems possible that $group
could, in principle, have an optimised implementation if it immediately follows a $sort
, in which case you could make it effectively make use of an index by putting a $sort
before hand.
There does not seem to be a straight answer either way in the docs about whether $group
has this optimisation (although I bet there would be if it did, so this suggests it doesn't). The answer is in MongoDB bug 4507: currently $group
does NOT have this implementation, so the top line of 4J41's answer is right after all. If you really need efficiency, depending on the application it may be quickest to use a regular query and do the grouping in your client code.
Edit: As sebastian's answer says, it seems that in practice using $sort
(that can take advantage of an index) before a $group
can make a very large speed improvement. The bug above is still open so it seems that it is not making the absolute best possible advantage of the index (that is, starting to group items as items are loaded, rather than loading them all in memory first). But it is still certainly worth doing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With