What pagination schemes can handle rapidly-changing content lists?

Tags:

Pagination is hard when your content rankings can change quickly, and even harder when those rankings differ per-user. (Let's treat infinite scroll as a type of pagination where the links are invisible.) There are two hard problems: newly-added content at the top, and reranked content.

Let's forget about newly-added content, and accept that you'll have to refresh page 1 to see it. Let's also pretend we're doing pure ORDER BY position; if you're ordering by something else, you might have to use window functions. Our pages have 4 rows of animals per page. They start out:

+----+----------+-----------+ | id | position^|  animal   | +----+----------+-----------+ |  1 |        1 | Alpacas   | |  2 |        2 | Bats      | |  3 |        3 | Cows      | |  4 |        4 | Dogs      | |  5 |        5 | Elephants | |  6 |        6 | Foxes     | |  7 |        7 | Giraffes  | |  8 |        8 | Horses    | +----+----------+-----------+

After we fetch page 1, and before we fetch page 2, a lot of items move around. The DB is now:

+----+----------+-----------+ | id | position^|  animal   | +----+----------+-----------+ |  4 |        1 | Dogs      | |  2 |        2 | Bats      | |  1 |        3 | Alpacas   | |  5 |        4 | Elephants | |  6 |        5 | Foxes     | |  7 |        6 | Giraffes  | |  3 |        7 | Cows      | |  8 |        8 | Horses    | +----+----------+-----------+

There are three common approaches:

Offset/limit approach

This is the typical naive approach; in Rails, it's how will_paginate and Kaminari work. If I want to fetch page 2, I'll do

SELECT * FROM animals ORDER BY animals.position OFFSET ((:page_num - 1) * :page_size)  LIMIT :page_size;

which gets rows 5-8. I'll never see Elephants, and I'll see Cows twice.

Last seen ID approach

Reddit takes a different approach. Instead of calculating the first row based on page size, the client tracks the ID of the last item you've seen, like a bookmark. When you hit "next", they start looking from that bookmark onward:

SELECT * FROM animals WHERE position > (   SELECT position FROM animals    WHERE id = :last_seen_id )  ORDER BY position LIMIT :page_size;

In some cases, this works better than page/offset. But in our case, Dogs, the last-seen post, zoomed right to #1. So the client sends up ?last_seen_id=4, and my page 2 is Bats, Alpacas, Elephants and Foxes. I haven't missed any animals, but I saw Bats and Alpacas twice.

Server side state

HackerNews (and our site, right now) solves this with server-side continuations; they store the entire result set for you (or at least several pages in advance?), and the "More" link references that continuation. When I fetch page 2, I ask for "page 2 of my original query". It uses the same offset/limit calculation, but since it's against the original query, I simply don't care that things have now moved around. I see Elephants, Foxes, Giraffes, and Horses. No dups, no missed items.

The downside is that we have to store a lot of state on the server. On HN, that's stored in RAM, and in reality those continuations often expire before you can press the "More" button, forcing you to go all the way back to page 1 to find a valid link. In most applications, you can store that in memcached, or even in the database itself (using your own table, or in Oracle or PostgreSQL, using holdable cursors). Depending on your application, there might be a performance hit; in PostgreSQL, at least, you have to find a way to hit the right database connection again, which requires a lot of sticky-state or some clever back-end routing.

Are these the only three possible approaches? If not, are there computer-science concepts that would give me Google juice to read about this? Are there ways to approximate the continuation approach without storing the entire result set? Long term, there's complex event-streaming/point-in-time systems, where "the result set as of the moment I fetched page 1" is forever derivable. Short of that... ?

550

asked Mar 07 '12 13:03

Jay Levitt

2 Answers

Oracle handles this nicely. As long as a cursor is open, you can fetch as many times as necessary and your results will always reflect the point in time at which the cursor was opened. It uses data from the undo logs to virtually rollback changes that were committed after the cursor was opened.

It will work as long as the required rollback data is still available. Eventually the logs get recycled and the rollback data is no longer available, so there is some limit, depending on the log space, system activity, etc.

Unfortunately (IMO), I don't know of any other DB that works like this. The other databases I've worked with use locks to ensure read-consistency, which is problematic if you want read consistency over more than very short duration.

108

answered Oct 19 '22 23:10

Todd Gibson

Solution 1: "the hacky solution"

A solution could consist in your client keeping track of the already seen content, a list of IDs for example. Each time you need another page, you add this ID list to the parameters of your server call. Your server can then order the content, remove already seen content and apply the offset to get the right page.

I would not recommend it though and I insist on hacky. I just write it down here because it's quick and could fit with some needs. here are the bad things I can think of:

1) It needs some work on client side to get it right (what does "already seen" means in my sentence above, what if I go to a previous page?)

2) The resulting order doesn't reflect your true ordering policy. A content could be displayed in page 2 although the policy should have put it on page 1. It could lead to a user misunderstanding. Let's take the example of stack overflow with its former ordering policy, that means most upvoted answers first. We could have a question with 6 upvotes being in page 2 while a question with 4 upvotes would be in page 1. This happen when the 2 or more upvotes occurred while user was still on page 1. --> can be surprising for the user.

Solution 2: "the client solution"

It's basically the client-side equivalent solution to the one you call "server-side state". It's then useful only if keeping track of the full order on server side is not convenient enough. It works if the items list is not infinite.

Call your server to get the full (finite) order list + the number of items/page
Save it on client side
Retrieve items directly through the ids of your content.

answered Oct 19 '22 23:10

Aurelien Porte

Related questions
                            
                                Correct way to retrieve a single object from Realm database
                            
                                How to get the total number of rows of a GROUP BY query?
                            
                                How to rake db:drop and rake db:create on Heroku? [duplicate]
                            
                                How to export database with data in MSSQL?
                            
                                Difference between a statement and a query in SQL
                            
                                utf8_bin vs. utf_unicode_ci
                            
                                What is a Projection?
                            
                                How does SELECT from two tables separated by a comma work? (SELECT * FROM T1, T2)
                            
                                Map items of collection snapshot in Firebase Firestore
                            
                                Copy table structure to new table in sqlite3
                            
                                Easy way to store JSON under Node.js
                            
                                How to check if a double is null?
                            
                                How do I query for fields containing a given text in MySQL?
                            
                                How to backup Sql Database Programmatically in C#
                            
                                Primary Key versus Unique Constraint?
                            
                                Database name convention: DATETIME column
                            
                                How to display table data more clearly in oracle sqlplus
                            
                                Ruby on Rails: How can I edit database.yml for postgresql?
                            
                                how to set auto increment column with sql developer
                            
                                Laravel 5 error SQLSTATE[HY000] [1045] Access denied for user 'homestead'@'localhost' (using password: YES)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What pagination schemes can handle rapidly-changing content lists?

Tags:

database

pagination

complex-event-processing

Jay Levitt

People also ask

2 Answers

Todd Gibson

Aurelien Porte

Recent Activity

Donate For Us