When doing find() query, I am able to pass options such as limit and sort. This, I assume, means that the database will stop trying to filter matching documents after limit number of documents have been found.
When trying to accomplish the same thing with an aggregation pipeline, I put the same query from the find() as the $match stage, but i cannot specify a limit.
The problem with $limit is that it only occurs in the next stage, so mongo will have processed every single document in the collection before passing it to the next stage, which can become very slow as the collection increases in size.
The aggregation framework acts like a stream, so if you have a $limit
stage in your pipeline, it will stop processing once the limit is reached.
For example:
> db.test.find()
{ "_id": 0 }
{ "_id": 1 }
{ "_id": 2 }
{ "_id": 3 }
{ "_id": 4 }
Let's say I want to use a $match
stage followed by $limit
stage. Using explain()
on the aggregation:
> db.test.explain('executionStats').aggregate([
{$match: {_id: {$lte: 3}}},
{$limit: 1}
])
...
"executionStats": {
"executionSuccess": true,
"nReturned": 1,
"executionTimeMillis": 0,
"totalKeysExamined": 1,
"totalDocsExamined": 1,
...
The explain output shows that MongoDB only examined 1 key (totalKeysExamined: 1
) and 1 document (totalDocsExamined: 1
) to return 1 document (nReturned: 1
) due to the $limit
stage. This is despite I'm specifying it to match _id <= 3
, which should select 4 documents.
Note that most stages work like this, with the exception of stages such as $group
which needs to gather all the required documents before it can perform grouping on them. Putting $limit
after $group
will not make the $limit
stage act as a stream, since it has to wait for the $group
stage to finish its process before it can limit the output.
Check out Explain Results and Aggregation Pipeline Optimization for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With