For efficiency, the Mongo documentation recommends that limit statements immediately follow sort statements, thus ending up with the somewhat nonsensical: <pre class="prettyprint"><code> collection.find(f).sort(s).limit(l).skip(p) </code></pre> I say this is somewhat nonsensical because it seems to say take the first l items, and then drop the first p of those l. Since p is usually larger than l, you'd think you'd end up with no results, but in practice you end up with l results. Aggregation works more as you'd expect: <pre class="prettyprint"><code>collection.aggregate({$unwind: u}, {$group: g},{$match: f}, {$sort: s}, {$limit: l}, {$skip: p}) </code></pre> returns 0 results if p>=l. <pre class="prettyprint"><code>collection.aggregate({$unwind: u}, {$group: g}, {$match: f}, {$sort: s}, {$skip: p}, {$limit: l}) </code></pre> works, but the documentation seems to imply that this will fail if the match returns a result set that's larger than working memory. Is this true? If so, is there a better way to perform pagination on a result set returned through aggregation? Source: the "Changed in version 2.4" comment at the end of this page: http://docs.mongodb.org/manual/reference/operator/aggregation/sort/

<blockquote> The documentation seems to imply that this (aggregation) will fail if the match returns a result set that's larger than working memory. Is this true? </blockquote> No. You can, for example, aggregate on a collection that is larger than physical memory without even using the <code>$match</code> operator. It might be slow, but it should work. There is no problem if <code>$match</code> returns something that is larger than RAM. Here are the actual pipeline limits. http://docs.mongodb.org/manual/core/aggregation-pipeline-limits/ The <code>$match</code> operator solely does not cause memory problems. As stated in the documentation, <code>$group</code> and <code>$sort</code> are the usual villains. They are cumulative, and might require access to the entire input set before they can produce any output. If they load too much data into physical memory, they will fail. <blockquote> If so, is there a better way to perform pagination on a result set returned through aggregation? </blockquote> I has been correctly said that you cannot "paginate" (apply <code>$skip</code> and <code>$limit</code>) on the result of the aggregation, because it is simply a MongoDB document. But you can "paginate" on the intermediate results of the aggregation pipeline. Using <code>$limit</code> on the pipeline will help on keeping the result set within the 16 MB bounds, the maximum BSON document size. Even if the collection grows, you should be safe. Problems could arise with <code>$group</code> and, specially, <code>$sort</code>. You can create "sort friendly" indexes to deal with them if they do actually happen. Have a look at the documentation on indexing strategies. http://docs.mongodb.org/manual/tutorial/sort-results-with-indexes/ Finally, be aware that <code>$skip</code> does not help with performance. On the contrary, they tend to slow down the application since it forces MongoDB to scan every skipped document to reach the desired point in the collection. http://docs.mongodb.org/manual/reference/method/cursor.skip/

Efficient pagination of MongoDB aggregation?

Tags:

mongodb

For efficiency, the Mongo documentation recommends that limit statements immediately follow sort statements, thus ending up with the somewhat nonsensical:

 collection.find(f).sort(s).limit(l).skip(p)

I say this is somewhat nonsensical because it seems to say take the first l items, and then drop the first p of those l. Since p is usually larger than l, you'd think you'd end up with no results, but in practice you end up with l results.

Aggregation works more as you'd expect:

collection.aggregate({$unwind: u}, {$group: g},{$match: f}, {$sort: s}, {$limit: l}, {$skip: p})

returns 0 results if p>=l.

collection.aggregate({$unwind: u}, {$group: g}, {$match: f}, {$sort: s}, {$skip: p}, {$limit: l})

works, but the documentation seems to imply that this will fail if the match returns a result set that's larger than working memory. Is this true? If so, is there a better way to perform pagination on a result set returned through aggregation?

Source: the "Changed in version 2.4" comment at the end of this page: http://docs.mongodb.org/manual/reference/operator/aggregation/sort/

966

asked Jan 17 '14 18:01

Bryan Larsen

2 Answers

In MongoDB cursor methods (i.e. when using find()) like limit, sort, skip can be applied in any order => order does not matter. A find() returns a cursor on which modifications applied. Sort is always done before limit, skip is done before limit as well. So in other words the order is: sort -> skip -> limit.

Aggregation framework does not return a DB cursor. Instead it returns a document with results of aggregation. It works by producing intermediate results at each step of the pipeline and thus the order of operations really matters.

I guess MongoDB does not support order for cursor modifier methods because of the way it's implemented internally.

You can't paginate on a result of aggregation framework because there is a single document with results only. You can still paginate on a regular query by using skip and limit, but a better practice would be to use a range query due to it's efficiency of using an index.

UPDATE:

Since v2.6 Mongo aggregation framework returns a cursor instead of a single document. Compare: v2.4 and v2.6.

161

answered Oct 21 '22 06:10

yǝsʞǝla

The documentation seems to imply that this (aggregation) will fail if the match returns a result set that's larger than working memory. Is this true?

No. You can, for example, aggregate on a collection that is larger than physical memory without even using the $match operator. It might be slow, but it should work. There is no problem if $match returns something that is larger than RAM.

Here are the actual pipeline limits.

http://docs.mongodb.org/manual/core/aggregation-pipeline-limits/

The $match operator solely does not cause memory problems. As stated in the documentation, $group and $sort are the usual villains. They are cumulative, and might require access to the entire input set before they can produce any output. If they load too much data into physical memory, they will fail.

If so, is there a better way to perform pagination on a result set returned through aggregation?

I has been correctly said that you cannot "paginate" (apply $skip and $limit) on the result of the aggregation, because it is simply a MongoDB document. But you can "paginate" on the intermediate results of the aggregation pipeline.

Using $limit on the pipeline will help on keeping the result set within the 16 MB bounds, the maximum BSON document size. Even if the collection grows, you should be safe.

Problems could arise with $group and, specially, $sort. You can create "sort friendly" indexes to deal with them if they do actually happen. Have a look at the documentation on indexing strategies.

http://docs.mongodb.org/manual/tutorial/sort-results-with-indexes/

Finally, be aware that $skip does not help with performance. On the contrary, they tend to slow down the application since it forces MongoDB to scan every skipped document to reach the desired point in the collection.

http://docs.mongodb.org/manual/reference/method/cursor.skip/

answered Oct 21 '22 05:10

Rafa

Related questions
                            
                                Query with string date format in mongodb
                            
                                $nin with the $expr
                            
                                Cannot Connect to AWS Database using TLS with Server CA Validation
                            
                                mongodb update operation with aggregation pipeline results in error code 14
                            
                                IntelliJ Database tool - Mongo read-only
                            
                                What is the difference between a cluster and a replica set in mongodb atlas?
                            
                                Java API for MongoDB
                            
                                MongoDB: Use save() to update an existing document in a collection
                            
                                Analytics - mongodb or cassandra
                            
                                HOW TO Find by Object ID on MongoDB with Casbah?
                            
                                In Mongodb, how do I get the count of the total results returned, without the limit?
                            
                                Determine size of a document using Mongoid
                            
                                How to query a Array[String] for a regexp match?
                            
                                update time of each record in MongoDB
                            
                                How to join MongoDB collections in Python?
                            
                                Spring Data MongoDB Indexed(unique=true)
                            
                                MongoDB : Sorting Data when using DBcollection find
                            
                                How exactly does spring-data-mongodb handle constructors when rehydrating objects?
                            
                                Absolute value with MongoDB aggregation framework
                            
                                Two phase commit in MongoDB

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With