Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow pagination over tons of records in mongodb

Tags:

mongodb

I have over 300k records in one collection in Mongo.

When I run this very simple query:

db.myCollection.find().limit(5); 

It takes only few miliseconds.

But when I use skip in the query:

db.myCollection.find().skip(200000).limit(5) 

It won't return anything... it runs for minutes and returns nothing.

How to make it better?

like image 460
Radek Simko Avatar asked Aug 29 '11 09:08

Radek Simko


People also ask

Can MongoDB handle millions of records?

Working with MongoDB and ElasticSearch is an accurate decision to process millions of records in real-time. These structures and concepts could be applied to larger datasets and will work extremely well too.

What is skip and limit in pagination?

MongoDB has an extremely straightforward way to implement pagination: using skip and limit operations on a cursor. skip(n) skips n items in a query, while limit(m) returns only the next m items starting from the n-th one.


2 Answers

One approach to this problem, if you have large quantities of documents and you are displaying them in sorted order (I'm not sure how useful skip is if you're not) would be to use the key you're sorting on to select the next page of results.

So if you start with

db.myCollection.find().limit(100).sort({created_date:true}); 

and then extract the created date of the last document returned by the cursor into a variable max_created_date_from_last_result, you can get the next page with the far more efficient (presuming you have an index on created_date) query

db.myCollection.find({created_date : { $gt : max_created_date_from_last_result } }).limit(100).sort({created_date:true});  
like image 78
Russell Avatar answered Oct 06 '22 01:10

Russell


From MongoDB documentation:

Paging Costs

Unfortunately skip can be (very) costly and requires the server to walk from the beginning of the collection, or index, to get to the offset/skip position before it can start returning the page of data (limit). As the page number increases skip will become slower and more cpu intensive, and possibly IO bound, with larger collections.

Range based paging provides better use of indexes but does not allow you to easily jump to a specific page.

You have to ask yourself a question: how often do you need 40000th page? Also see this article;

like image 37
Tomasz Nurkiewicz Avatar answered Oct 06 '22 00:10

Tomasz Nurkiewicz