Is there an explain function for the Aggregation framework in MongoDB? I can't see it in the documentation.
If not is there some other way to check, how a query performs within the aggregation framework?
I know with find you just do
db.collection.find().explain()
But with the aggregation framework I get an error
db.collection.aggregate( { $project : { "Tags._id" : 1 }}, { $unwind : "$Tags" }, { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}}, { $group: { _id : { id: "$_id"}, "count": { $sum:1 } } }, { $sort: {"count":-1}} ).explain()
The MongoDB Aggregation Framework is a way to query the data from MongoDB. It helps us to break the complex logics into a simple set of sequential operations. Just like a waterfall model, Output from one stage is fed as an input for the next stage until the desired result is achieved.
Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. In SQL count(*) and with group by is an equivalent of MongoDB aggregation.
Aggregation framework is used for transforming the data using stage operators. Here is the definition from MongoDB documentation. The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines.
Aggregation (map reduce or otherwise) is very slow in mongo because it is done by the javascript VM, not the database engine. This continues to be a limitation of this (very good, imo) db for time series data.
Starting with MongoDB version 3.0, simply changing the order from
collection.aggregate(...).explain()
to
collection.explain().aggregate(...)
will give you the desired results (documentation here).
For older versions >= 2.6, you will need to use the explain
option for aggregation pipeline operations
explain:true
db.collection.aggregate([ { $project : { "Tags._id" : 1 }}, { $unwind : "$Tags" }, { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}}, { $group: { _id : "$_id", count: { $sum:1 } }}, {$sort: {"count":-1}} ], { explain:true } )
An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match
, $sort
, $geonear
at the beginning of a pipeline) as well as subsequent $lookup
and $graphLookup
stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project
, $unwind
, and $group
) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse
option is set).
In general, you can optimize aggregation pipelines by:
$match
stage to restrict processing to relevant documents.$match
/ $sort
stages are supported by an efficient index.$match
, $limit
, and $skip
.There are also a number of Aggregation Pipeline Optimizations that automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.
As at MongoDB 3.4, the Aggregation Framework explain
option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats
mode for a find()
query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain()
query with executionStats
or allPlansExecution
verbosity.
There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With