Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregation pipeline and indexes

From http://docs.mongodb.org/manual/core/indexes/#multikey-indexes, it is possible to create an index on an array field using a multikey index. http://docs.mongodb.org/manual/applications/aggregation/#pipeline-operators-and-indexes lists some ways of how an index can be used in aggregation framework. However, there may be times that I may need to perform an $unwind on an array field to perform a $group. My question is, are multikey indexes (or any index using such array field) can still be used once they are operated on in the middle of the pipeline?

like image 565
MervS Avatar asked Mar 25 '13 03:03

MervS


People also ask

What is an aggregation pipeline?

An aggregation pipeline consists of one or more stages that process documents: Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values. The documents that are output from a stage are passed to the next stage.

Does aggregation use index?

Indexes can cover queries in an aggregation pipeline. A covered query uses an index to return all of the documents and has high performance.

Which pipeline is used for aggregation in MongoDB?

Mongoid exposes MongoDB's aggregation pipeline, which is used to construct flows of operations that process and return results. The aggregation pipeline is a superset of the deprecated map/reduce framework functionality.


1 Answers

Generally, only pipeline operators that can be flattened to a normal query ($match, $limit, $sort, and $skip) will be able to use the indexes on a collection. This is one of the reasons the $geoNear operator added in 2.4 has to be at the start of the pipeline.

Once you mutate the documents with $project, $group, or $unwind the index is no longer valid/usable.

If you have an index on an array field you can still use it before the $unwind to speed up the selection of documents to pipeline and then further refine the selected documents with a second $match.

Consider documents like:

{ tags: [ 'cat', 'bird', 'blue' ] }

With an index on tags.

If you only wanted to group the tags starting with b then you could perform an aggregation like:

{ pipeline: [
      { $match : { tags : /^b/ } },
      { $unwind : '$tags' },
      { $match : { tags : /^b/ } },
      /* the rest */
  ] }

The first $match does the coarse grain match using the index on tags.

The second match after the $unwind won't be able to use the index (the document above is now 3 documents) but can evaluate each of those documents to filter out the extra documents that get created (to remove { tags : 'cat' } from the example).

HTH - Rob.

like image 137
Rob Moore Avatar answered Oct 01 '22 06:10

Rob Moore