Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort and paginate a collection

How to paginate a query sorting by a non-unique field? For example docs in a collection may be (sorted by s:1, then _id:-1):

{_id: 19, s: 3},
{_id: 17, s: 3},
{_id: 58, s: 4},
// etc...

There is a trivial limit/skip method that works... slowly.

Is it possible to use something like:

db.collection.find()
  .sort({s:1, _id:-1})
  .min({s:3, _id:17})    // this does not work as wanted!
  .limit(2);

to retrieve

{_id: 17, s: 3},
{_id: 58, s: 4}

?

like image 272
Andrei Toutoukine Avatar asked Feb 10 '23 16:02

Andrei Toutoukine


1 Answers

If you want to paginate by "page numbers" then you are pretty much stuck with the .limit() and .skip() methods applied after you sort the results on your key. You might have done some reading and found that it is "not very efficient", mostly due to the cost of "skipping" through "n" results in order to reach a certain page.

But the principle is sound where you need it:

db.collection.find().sort({ "s": -1, "_id": 1 }).skip(<page-1>).limit(<pageSize>)

Provided you need to only move "forward" in your paging there is a faster alternative to work with, and also for "sorted" results.

The key is to keep a reference to the "last seen" value of "s" and then generally a list of _id values until that value of "s" changes. So demonstrating with a few more documents, already sorted for demonstration purposes:

{ "_id": 1, "s": 3 },
{ "_id": 2, "s": 3 },
{ "_id": 3, "s": 3 },
{ "_id": 4, "s": 2 },
{ "_id": 5, "s": 1 },
{ "_id": 6, "s": 1 },

In order to get the "first page" of "two" results your first query is simple:

db.collection.find().sort({ "s": -1, "_id": 1}).limit(2)

But follow that through into processing the documents:

var lastVal = null,
    lastSeen = [];

db.collection.find().sort({ "s": -1, "_id": 1}).limit(2).forEach(function(doc) {
    if ( doc.s != lastVal ) {    // Change when different
        lastVal = doc.s;
        lastSeen = [];
    }
    lastSeen.push(doc._id);      // Push _id onto array
    // do other things like output
})

So on that first iteration the lastVal value will be 3 and the lastSeen will contain both document _id values in the array [1,2]. These things you would store in something like user session data awaiting the next page request.

On your request for the next page set then you issue as follows:

var lastVal = 3,
    lastSeen = [1,2];

db.collection.find({ 
    "_id": { "$nin": lastSeen }, 
    "s": { "$lte": lastVal }
}).sort({ "s": -1, "_id": 1}).limit(2).forEach(function(doc) {
    if ( doc.s != lastVal ) {    // Change when different
        lastVal = doc.s;
        lastSeen = [];
    }
    lastSeen.push(doc._id);      // Push _id onto array
    // do other things like output
})

That asks that both the selection of "s" needs to start from a value "less than or equal to" ( because of the direction of the sort ) the lastVal recorded, and that the "_id" field cannot contain the values recorded in lastSeen.

The resulting next page is:

{ "_id": 3, "s": 3 },
{ "_id": 4, "s": 2 },

But now if you follow the logic the lastVal is of course 2 and the lastSeen now only has the single array element [4]. Since the next query only needs to follow on from 2 as a less or equal value there is no need to keep the other previously seen "_id" values as they would not be within that selection.

And then the process just follows on:

var lastVal = 2,
    lastSeen = [2];

db.collection.find({ 
    "_id": { "$nin": lastSeen }, 
    "s": { "$lte": lastVal }
}).sort({ "s": -1, "_id": 1}).limit(2).forEach(function(doc) {
    if ( doc.s != lastVal ) {    // Change when different
        lastVal = doc.s;
        lastSeen = [];
    }
    lastSeen.push(doc._id);      // Push _id onto array
    // do other things like output
})

So by following that pattern of logic you can "store" that information found from your "previousc page" of results and move "forward" through results very efficiently.

But if you need to jump to "page 20" or similar types of operations, then you are stuck with .limit() and .skip(). It's slower that way, but it depends on what you can live with.

like image 186
Blakes Seven Avatar answered Feb 26 '23 23:02

Blakes Seven