Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stateless pagination in CouchDB?

Most of the research I've seen on pagination with CouchDB suggests that what you need to do is take the first ten (or however many) items from your view, then record the last document's docid and pass it on to the next page. Unfortunately, I can see a few glaring issues with that method.

  • It apparently makes it impossible to skip around within the set of pages (if someone jumps directly to page 100, you would have to run the queries for pages 2-99 so you would know how to load page 100).
  • It requires you to pass around possibly a lot of state information between your pages.
  • It's difficult to properly code.

Unfortunately, my research has shown that using skip develops considerable slowdown for datasets 5000 records or larger, and would be positively crippling once you reached anything really huge (going to page 20000 with 10 records to a page would take about 20 seconds - and yes, there are datasets that big in production). So that's not really an option.

So, what I'm asking is, is there an efficient way to paginate view results in CouchDB that can get all the items from an arbitrary page? (I'm using couchdb-python, but hopefully there isn't anything about this that would be client-dependent.)

like image 251
LeafStorm Avatar asked Jun 21 '10 01:06

LeafStorm


1 Answers

I'm new to CouchDB, but I think I might be able to help. I read the following from CouchDB: The Definitive Guide:

One drawback of the linked list style pagination is that... jumping to a specific page doesn’t really work... If you really do need jump to page over the full range of documents... you can still maintain an integer value index as the view index and have a hybrid approach at solving pagination.
   — http://books.couchdb.org/relax/receipts/pagination

If I'm reading that right, the approach in your case is going to be:

  1. Embed a numeric sequence into your document set.
  2. Extract that numeric sequence to a numeric view index.
  3. Use arithmetic to calculate the correct start/end numeric keys for your arbitrary page.

For step 1, you need to actually add something like "page_seq" as a field to your document. I don't have a specific recommendation on how you obtain this number, and am curious to know what people think. For this scheme to work, it has to increment by exactly 1 for each new record, so RDBMS sequences are probably out (the ones I'm familiar with may skip numbers).

For step 2, you'd write a view with a map function that's something like this (in Javascript):

function(doc):
    emit(doc.page_seq, doc)

For step 3, you'd write your query something like this (assuming the page_seq and page numbering sequences start at 1):

results = db.view("name_of_view")
page_size = ... # say, 20
page_no = ... # 1 = page 1, 2 = page 2, etc.
begin = ((page_no - 1) * page_size) + 1
end = begin + page_size
my_page = results[begin:end]

and then you can iterate through my_page.

A clear drawback of this is that page_seq assumes you're not filtering the data set for your view, and you'll quickly run into trouble if you're trying to get this to work with an arbitrary query.

Comments/improvements welcome.

like image 62
Owen S. Avatar answered Sep 23 '22 11:09

Owen S.