Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB - Pagination based on non-unique fields

Tags:

mongodb

I am familiar with the best practice of range based pagination on large MongoDB collections, however I am struggling with figuring out how to paginate a collection where the sort value is on a non-unique field.

For example, I have a large collection of users, and there is a field for the number of times they have done something. This field is defintely non-unique, and could have large groups of documents that have the same value.

I would like to return results sorted by that 'numTimesDoneSomething' field.

Here is a sample data set:

{_id: ObjectId("50c480d81ff137e805000003"), numTimesDoneSomething: 12}
{_id: ObjectId("50c480d81ff137e805000005"), numTimesDoneSomething: 9}
{_id: ObjectId("50c480d81ff137e805000006"), numTimesDoneSomething: 7}
{_id: ObjectId("50c480d81ff137e805000007"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000002"), numTimesDoneSomething: 15}
{_id: ObjectId("50c480d81ff137e805000008"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000009"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000004"), numTimesDoneSomething: 12}
{_id: ObjectId("50c480d81ff137e805000010"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000011"), numTimesDoneSomething: 1}

How would I return this data set sorted by 'numTimesDoneSomething' with 2 records per page?

like image 976
Adam Duro Avatar asked Jan 10 '13 06:01

Adam Duro


2 Answers

@cubbuk shows a good example using offset (skip) but you can also mould the query he shows for ranged pagination as well:

db.collection.find().sort({numTimesDoneSomething:-1, _id:1})

Since the _id here will be unique and you are seconding on it you can actually then range by _id and the results, even between two records having numTimesDoneSomething of 12, should be consistent as to whether they should be on one page or the next.

So doing something as simple as

var q = db.collection.find({_id: {$gt: last_id}}).sort({numTimesDoneSomething:-1, _id:1}).limit(2)

Should work quite good for ranged pagination.

like image 115
Sammaye Avatar answered Oct 17 '22 07:10

Sammaye


You can sort on multiple fields in this case sort on numTimesDoneSomething and id field. Since id_ field is ascending in itself already according to the insertion timestamp, you will able to paginate through the collection without iterating over duplicate data unless new data is inserted during the iteration.

db.collection.find().sort({numTimesDoneSomething:-1, _id:1}).offset(index).limit(2)
like image 41
cubbuk Avatar answered Oct 17 '22 05:10

cubbuk