I am familiar with the best practice of range based pagination on large MongoDB collections, however I am struggling with figuring out how to paginate a collection where the sort value is on a non-unique field.
For example, I have a large collection of users, and there is a field for the number of times they have done something. This field is defintely non-unique, and could have large groups of documents that have the same value.
I would like to return results sorted by that 'numTimesDoneSomething' field.
Here is a sample data set:
{_id: ObjectId("50c480d81ff137e805000003"), numTimesDoneSomething: 12}
{_id: ObjectId("50c480d81ff137e805000005"), numTimesDoneSomething: 9}
{_id: ObjectId("50c480d81ff137e805000006"), numTimesDoneSomething: 7}
{_id: ObjectId("50c480d81ff137e805000007"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000002"), numTimesDoneSomething: 15}
{_id: ObjectId("50c480d81ff137e805000008"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000009"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000004"), numTimesDoneSomething: 12}
{_id: ObjectId("50c480d81ff137e805000010"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000011"), numTimesDoneSomething: 1}
How would I return this data set sorted by 'numTimesDoneSomething' with 2 records per page?
@cubbuk shows a good example using offset
(skip
) but you can also mould the query he shows for ranged pagination as well:
db.collection.find().sort({numTimesDoneSomething:-1, _id:1})
Since the _id
here will be unique and you are seconding on it you can actually then range by _id
and the results, even between two records having numTimesDoneSomething
of 12
, should be consistent as to whether they should be on one page or the next.
So doing something as simple as
var q = db.collection.find({_id: {$gt: last_id}}).sort({numTimesDoneSomething:-1, _id:1}).limit(2)
Should work quite good for ranged pagination.
You can sort on multiple fields in this case sort on numTimesDoneSomething
and id
field. Since id_ field is ascending in itself already according to the insertion timestamp, you will able to paginate through the collection without iterating over duplicate data unless new data is inserted during the iteration.
db.collection.find().sort({numTimesDoneSomething:-1, _id:1}).offset(index).limit(2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With