Most of the research I've seen on pagination with CouchDB suggests that what you need to do is take the first ten (or however many) items from your view, then record the last document's docid and pass it on to the next page. Unfortunately, I can see a few glaring issues with that method.
Unfortunately, my research has shown that using skip
develops considerable slowdown for datasets 5000 records or larger, and would be positively crippling once you reached anything really huge (going to page 20000 with 10 records to a page would take about 20 seconds - and yes, there are datasets that big in production). So that's not really an option.
So, what I'm asking is, is there an efficient way to paginate view results in CouchDB that can get all the items from an arbitrary page? (I'm using couchdb-python, but hopefully there isn't anything about this that would be client-dependent.)
I'm new to CouchDB, but I think I might be able to help. I read the following from CouchDB: The Definitive Guide:
One drawback of the linked list style pagination is that... jumping to a specific page doesn’t really work... If you really do need jump to page over the full range of documents... you can still maintain an integer value index as the view index and have a hybrid approach at solving pagination.
— http://books.couchdb.org/relax/receipts/pagination
If I'm reading that right, the approach in your case is going to be:
For step 1, you need to actually add something like "page_seq" as a field to your document. I don't have a specific recommendation on how you obtain this number, and am curious to know what people think. For this scheme to work, it has to increment by exactly 1 for each new record, so RDBMS sequences are probably out (the ones I'm familiar with may skip numbers).
For step 2, you'd write a view with a map function that's something like this (in Javascript):
function(doc):
emit(doc.page_seq, doc)
For step 3, you'd write your query something like this (assuming the page_seq and page numbering sequences start at 1):
results = db.view("name_of_view")
page_size = ... # say, 20
page_no = ... # 1 = page 1, 2 = page 2, etc.
begin = ((page_no - 1) * page_size) + 1
end = begin + page_size
my_page = results[begin:end]
and then you can iterate through my_page.
A clear drawback of this is that page_seq assumes you're not filtering the data set for your view, and you'll quickly run into trouble if you're trying to get this to work with an arbitrary query.
Comments/improvements welcome.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With