How to paginate a query sorting by a non-unique field? For example docs in a collection may be (sorted by s:1, then _id:-1):
{_id: 19, s: 3},
{_id: 17, s: 3},
{_id: 58, s: 4},
// etc...
There is a trivial limit/skip method that works... slowly.
Is it possible to use something like:
db.collection.find()
.sort({s:1, _id:-1})
.min({s:3, _id:17}) // this does not work as wanted!
.limit(2);
to retrieve
{_id: 17, s: 3},
{_id: 58, s: 4}
?
If you want to paginate by "page numbers" then you are pretty much stuck with the .limit()
and .skip()
methods applied after you sort the results on your key. You might have done some reading and found that it is "not very efficient", mostly due to the cost of "skipping" through "n" results in order to reach a certain page.
But the principle is sound where you need it:
db.collection.find().sort({ "s": -1, "_id": 1 }).skip(<page-1>).limit(<pageSize>)
Provided you need to only move "forward" in your paging there is a faster alternative to work with, and also for "sorted" results.
The key is to keep a reference to the "last seen" value of "s" and then generally a list of _id
values until that value of "s" changes. So demonstrating with a few more documents, already sorted for demonstration purposes:
{ "_id": 1, "s": 3 },
{ "_id": 2, "s": 3 },
{ "_id": 3, "s": 3 },
{ "_id": 4, "s": 2 },
{ "_id": 5, "s": 1 },
{ "_id": 6, "s": 1 },
In order to get the "first page" of "two" results your first query is simple:
db.collection.find().sort({ "s": -1, "_id": 1}).limit(2)
But follow that through into processing the documents:
var lastVal = null,
lastSeen = [];
db.collection.find().sort({ "s": -1, "_id": 1}).limit(2).forEach(function(doc) {
if ( doc.s != lastVal ) { // Change when different
lastVal = doc.s;
lastSeen = [];
}
lastSeen.push(doc._id); // Push _id onto array
// do other things like output
})
So on that first iteration the lastVal
value will be 3
and the lastSeen
will contain both document _id
values in the array [1,2]
.
These things you would store in something like user session data awaiting the next page request.
On your request for the next page set then you issue as follows:
var lastVal = 3,
lastSeen = [1,2];
db.collection.find({
"_id": { "$nin": lastSeen },
"s": { "$lte": lastVal }
}).sort({ "s": -1, "_id": 1}).limit(2).forEach(function(doc) {
if ( doc.s != lastVal ) { // Change when different
lastVal = doc.s;
lastSeen = [];
}
lastSeen.push(doc._id); // Push _id onto array
// do other things like output
})
That asks that both the selection of "s" needs to start from a value "less than or equal to" ( because of the direction of the sort ) the lastVal
recorded, and that the "_id" field cannot contain the values recorded in lastSeen
.
The resulting next page is:
{ "_id": 3, "s": 3 },
{ "_id": 4, "s": 2 },
But now if you follow the logic the lastVal
is of course 2
and the lastSeen
now only has the single array element [4]
. Since the next query only needs to follow on from 2
as a less or equal value there is no need to keep the other previously seen "_id" values as they would not be within that selection.
And then the process just follows on:
var lastVal = 2,
lastSeen = [2];
db.collection.find({
"_id": { "$nin": lastSeen },
"s": { "$lte": lastVal }
}).sort({ "s": -1, "_id": 1}).limit(2).forEach(function(doc) {
if ( doc.s != lastVal ) { // Change when different
lastVal = doc.s;
lastSeen = [];
}
lastSeen.push(doc._id); // Push _id onto array
// do other things like output
})
So by following that pattern of logic you can "store" that information found from your "previousc page" of results and move "forward" through results very efficiently.
But if you need to jump to "page 20" or similar types of operations, then you are stuck with .limit()
and .skip()
. It's slower that way, but it depends on what you can live with.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With