Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb Explain for Aggregation framework

Is there an explain function for the Aggregation framework in MongoDB? I can't see it in the documentation.

If not is there some other way to check, how a query performs within the aggregation framework?

I know with find you just do

db.collection.find().explain() 

But with the aggregation framework I get an error

db.collection.aggregate(     { $project : { "Tags._id" : 1 }},     { $unwind : "$Tags" },     { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},     {          $group:          {              _id : { id: "$_id"},             "count": { $sum:1 }          }     },     { $sort: {"count":-1}} ).explain() 
like image 949
SCB Avatar asked Oct 03 '12 04:10

SCB


People also ask

What is the aggregation framework in MongoDB?

The MongoDB Aggregation Framework is a way to query the data from MongoDB. It helps us to break the complex logics into a simple set of sequential operations. Just like a waterfall model, Output from one stage is fed as an input for the next stage until the desired result is achieved.

What is aggregation in MongoDB explain with example?

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. In SQL count(*) and with group by is an equivalent of MongoDB aggregation.

What is an aggregate framework?

Aggregation framework is used for transforming the data using stage operators. Here is the definition from MongoDB documentation. The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines.

Is MongoDB good for aggregate?

Aggregation (map reduce or otherwise) is very slow in mongo because it is done by the javascript VM, not the database engine. This continues to be a limitation of this (very good, imo) db for time series data.


1 Answers

Starting with MongoDB version 3.0, simply changing the order from

collection.aggregate(...).explain() 

to

collection.explain().aggregate(...) 

will give you the desired results (documentation here).

For older versions >= 2.6, you will need to use the explain option for aggregation pipeline operations

explain:true

db.collection.aggregate([     { $project : { "Tags._id" : 1 }},     { $unwind : "$Tags" },     { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},     { $group: {          _id : "$_id",         count: { $sum:1 }      }},     {$sort: {"count":-1}}   ],   {     explain:true   } ) 

An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match, $sort, $geonear at the beginning of a pipeline) as well as subsequent $lookup and $graphLookup stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project, $unwind, and $group) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse option is set).

Optimizing pipelines

In general, you can optimize aggregation pipelines by:

  • Starting a pipeline with a $match stage to restrict processing to relevant documents.
  • Ensuring the initial $match / $sort stages are supported by an efficient index.
  • Filtering data early using $match, $limit , and $skip .
  • Minimizing unnecessary stages and document manipulation (perhaps reconsidering your schema if complicated aggregation gymnastics are required).
  • Taking advantage of newer aggregation operators if you have upgraded your MongoDB server. For example, MongoDB 3.4 added many new aggregation stages and expressions including support for working with arrays, strings, and facets.

There are also a number of Aggregation Pipeline Optimizations that automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.

Limitations

As at MongoDB 3.4, the Aggregation Framework explain option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats mode for a find() query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain() query with executionStats or allPlansExecution verbosity.

There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines:

  • SERVER-19758: Add "executionStats" and "allPlansExecution" explain modes to aggregation explain
  • SERVER-21784: Track execution stats for each aggregation pipeline stage and expose via explain
  • SERVER-22622: Improve $lookup explain to indicate query plan on the "from" collection
like image 123
Stennie Avatar answered Sep 30 '22 15:09

Stennie