Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling paging with changing sort orders

I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request.

I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution.

  • Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.
  • Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).

Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data?

Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.

like image 617
jstol Avatar asked Oct 04 '14 00:10

jstol


People also ask

How do you handle pagination and sorting?

The Core API supports sorting and pagination for endpoints that return arrays of resources. The sorting mechanism places the resources in order; the pagination mechanism then returns a specific range of those ordered resources. You control sorting and pagination through URL query parameters.

What is pagination technique?

📚 What is pagination? Pagination is a strategy employed when querying any dataset that holds more than just a few hundred records. Thanks to pagination, we can split our large dataset into chunks ( or pages ) that we can gradually fetch and display to the user, thus reducing the load on the database.

Which method is considered most efficient for pagination in REST API?

The Offset method is the most common way to paginate resources (with an offset on query), but it's less efficient than the Search-after method.

What is pagination order?

Pagination is the process of separating print or digital content into discrete pages. For print documents and some online content, pagination also refers to the automated process of adding consecutive numbers to identify the sequential order of pages.


1 Answers

This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements:

  • Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.

  • Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.

  • Maintain different state for each client

This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state.

There are variations within those that mix the various compromises, but that's what it all boils down to.

For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination.

Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination.

Etc.

See also:

  • How to provide an API client with 1,000,000 database results?
  • Using "Cursors" for paging in PostgreSQL
  • Iterate over large external postgres db, manipulate rows, write output to rails postgres db
  • offset/limit performance optimization
  • If PostgreSQL count(*) is always slow how to paginate complex queries?
  • How to return sample row from database one by one

I'd probably implement a hybrid solution of some form, like:

  • Using a cursor, read and immediately send the first part of the data to the client.

  • Immediately fetch enough extra data from the cursor to satisfy 99% of clients' requirements. Store it to a fast, unsafe cache like memcached, Redis, BigMemory, EHCache, whatever under a key that'll let me retrieve it for later requests by the same client. Then close the cursor to free the DB resources.

  • Expire the cache on a least-recently-used basis, so if the client doesn't keep reading fast enough they have to go get a fresh set of data from the DB, and the pagination changes.

  • If the client wants more results than the vast majority of its peers, pagination will change at some point as you switch to reading direct from the DB rather than the cache or generate a new bigger cached dataset.

That way most clients won't notice pagination issues and you don't have to send vast amounts of data to most clients, but you won't melt your DB server. However, you need a big boofy cache to get away with this. Its practical depends on whether your clients can cope with pagination breaking - if it's simply not acceptable to break pagination, then you're stuck with doing it DB-side with cursors, temp tables, coping the whole result set at first request, etc. It also depends on the data set size and how much data each client usually requires.

like image 160
Craig Ringer Avatar answered Sep 23 '22 10:09

Craig Ringer