I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request. I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution. <ul> <li>Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.</li> <li>Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).</li> </ul> Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data? Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.

This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements: <ul> <li>Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.</li> <li>Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.</li> <li>Maintain different state for each client</li> </ul> This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state. There are variations within those that mix the various compromises, but that's what it all boils down to. For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination. Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination. Etc. See also: <ul> <li>How to provide an API client with 1,000,000 database results?</li> <li>Using "Cursors" for paging in PostgreSQL</li> <li>Iterate over large external postgres db, manipulate rows, write output to rails postgres db</li> <li>offset/limit performance optimization</li> <li>If PostgreSQL count(*) is always slow how to paginate complex queries?</li> <li>How to return sample row from database one by one</li> </ul> I'd probably implement a hybrid solution of some form, like: <ul> <li>Using a cursor, read and immediately send the first part of the data to the client.</li> <li>Immediately fetch enough extra data from the cursor to satisfy 99% of clients' requirements. Store it to a fast, unsafe cache like memcached, Redis, BigMemory, EHCache, whatever under a key that'll let me retrieve it for later requests by the same client. Then close the cursor to free the DB resources.</li> <li>Expire the cache on a least-recently-used basis, so if the client doesn't keep reading fast enough they have to go get a fresh set of data from the DB, and the pagination changes.</li> <li>If the client wants more results than the vast majority of its peers, pagination will change at some point as you switch to reading direct from the DB rather than the cache or generate a new bigger cached dataset.</li> </ul> That way most clients won't notice pagination issues and you don't have to send vast amounts of data to most clients, but you won't melt your DB server. However, you need a big boofy cache to get away with this. Its practical depends on whether your clients can cope with pagination breaking - if it's simply not acceptable to break pagination, then you're stuck with doing it DB-side with cursors, temp tables, coping the whole result set at first request, etc. It also depends on the data set size and how much data each client usually requires.

Handling paging with changing sort orders

Tags:

rest

sql

postgresql

pagination

go

I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request.

I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution.

Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.
Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).

Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data?

Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.

617

asked Oct 04 '14 00:10

jstol

1 Answers

This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements:

Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.
Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.
Maintain different state for each client

This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state.

There are variations within those that mix the various compromises, but that's what it all boils down to.

For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination.

Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination.

Etc.

Craig Ringer

Related questions
                            
                                Speeding up checking of IP address membership in CIDR ranges, for large datasets
                            
                                Creating a dynamic query using IQueryable
                            
                                SQL query with "not exists" not working
                            
                                Column Name of PL/SQL Table-Type
                            
                                MySQL join two table with the maximum value on another field
                            
                                losing null values filtering sql query results using where
                            
                                Simulate MySQL records using inline data
                            
                                Update table using alias
                            
                                MySQL Unknown column in having clause
                            
                                Postgres Inner Join Select query returns error: column does not exist
                            
                                get the text of a stored procedure into a variable in SQL Server
                            
                                Android SQLite - Primary Key - Inserting into table
                            
                                Entity Framework SQL query not returning results
                            
                                Aggregate function to detect trend in PostgreSQL
                            
                                SSRS calculate percentage of column based on row total
                            
                                string or binary data would be truncated error message
                            
                                Inserting Multiple Rows in Sybase ASE
                            
                                Netezza UPDATE from one table to another
                            
                                Show data in table from Rails Console for PostgreSQL
                            
                                PostgreSQL does not use a partial index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With