Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a limit while performing $match stage in aggregation pipeline

When doing find() query, I am able to pass options such as limit and sort. This, I assume, means that the database will stop trying to filter matching documents after limit number of documents have been found.

When trying to accomplish the same thing with an aggregation pipeline, I put the same query from the find() as the $match stage, but i cannot specify a limit.

The problem with $limit is that it only occurs in the next stage, so mongo will have processed every single document in the collection before passing it to the next stage, which can become very slow as the collection increases in size.

like image 208
Meme Stream Avatar asked Sep 16 '25 15:09

Meme Stream


1 Answers

The aggregation framework acts like a stream, so if you have a $limit stage in your pipeline, it will stop processing once the limit is reached.

For example:

> db.test.find()
{ "_id": 0 }
{ "_id": 1 }
{ "_id": 2 }
{ "_id": 3 }
{ "_id": 4 }

Let's say I want to use a $match stage followed by $limit stage. Using explain() on the aggregation:

> db.test.explain('executionStats').aggregate([
      {$match: {_id: {$lte: 3}}},
      {$limit: 1}
  ])
...
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 1,
          "executionTimeMillis": 0,
          "totalKeysExamined": 1,
          "totalDocsExamined": 1,
...

The explain output shows that MongoDB only examined 1 key (totalKeysExamined: 1) and 1 document (totalDocsExamined: 1) to return 1 document (nReturned: 1) due to the $limit stage. This is despite I'm specifying it to match _id <= 3, which should select 4 documents.

Note that most stages work like this, with the exception of stages such as $group which needs to gather all the required documents before it can perform grouping on them. Putting $limit after $group will not make the $limit stage act as a stream, since it has to wait for the $group stage to finish its process before it can limit the output.

Check out Explain Results and Aggregation Pipeline Optimization for more information.

like image 154
kevinadi Avatar answered Sep 19 '25 06:09

kevinadi