I am making an API over HTTP that fetches many rows from PostgreSQL with pagination. In ordinary cases, I usually implement such pagination through naive <code>OFFET</code>/<code>LIMIT</code> clause. However, there are some special requirements in this case: <ul> <li>A lot of rows there are so that I believe users cannot reach the end (imagine Twitter timeline).</li> <li>Pages does not have to be randomly accessible but only sequentially.</li> <li>API would return a URL which contains a cursor token that directs to the page of continuous chunks.</li> <li>Cursor tokens have not to exist permanently but for some time.</li> <li>Its ordering has frequent fluctuating (like Reddit rankings), however continuous cursors should keep their consistent ordering.</li> </ul> How can I achieve the mission? I am ready to change my whole database schema for it!

Assuming it's only the ordering of the results that fluctuates and not the data in the rows, Fredrik's answer makes sense. However, I'd suggest the following additions: <ul> <li> store the id list in a postgresql table using the array type rather than in memory. Doing it in memory, unless you carefully use something like redis with auto expiry and memory limits, is setting yourself up for a DOS memory consumption attack. I imagine it would look something like this: <pre class="prettyprint"><code>create table foo_paging_cursor ( cursor_token ..., -- probably a uuid is best or timestamp (see below) result_ids integer[], -- or text[] if you have non-integer ids expiry_time TIMESTAMP ); </code></pre> </li> <li>You need to decide if the cursor_token and result_ids can be shared between users to reduce your storage needs and the time needed to run the initial query per user. If they can be shared, chose a cache window, say 1 or 5 minute(s), and then upon a new request create the cache_token for that time period and then check to see if the results ids have already been calculated for that token. If not, add a new row for that token. You should probably add a lock around the check/insert code to handle concurrent requests for a new token. </li> <li>Have a scheduled background job that purges old tokens/results and make sure your client code can handle any errors related to expired/invalid tokens.</li> </ul> Don't even consider using real db cursors for this. Keeping the result ids in Redis lists is another way to handle this (see the LRANGE command), but be careful with expiry and memory usage if you go down that path. Your Redis key would be the cursor_token and the ids would be the members of the list.

I know absolutely nothing about PostgreSQL, but I'm a pretty decent SQL Server developer, so I'd like to take a shot at this anyway :) How many rows/pages do you expect a user would maximally browse through per session? For instance, if you expect a user to page through a maximum of 10 pages for each session [each page containing 50 rows], you could make take that max, and setup the webservice so that when the user requests the first page, you cache 10*50 rows (or just the Id:s for the rows, depends on how much memory/simultaneous users you got). This would certainly help speed up your webservice, in more ways than one. And it's quite easy to implement to. So: <ul> <li>When a user requests data from page #1. Run a query (complete with order by, join checks, etc), store all the id:s into an array (but a maximum of 500 ids). Return datarows that corresponds to id:s in the array at positions 0-9.</li> <li>When the user requests page #2-10. Return datarows that corresponds to id:s in the array at posisions (page-1)*50 - (page)*50-1. </li> </ul> You could also bump up the numbers, an array of 500 int:s would only occupy 2K of memory, but it also depends on how fast you want your initial query/response. I've used a similar technique on a live website, and when the user continued past page 10, I just switched to queries. I guess another solution would be to continue to expand/fill the array. (Running the query again, but excluding already included id:s). Anyway, hope this helps!

Best way to fetch the continuous list with PostgreSQL in web

Tags:

http

postgresql

pagination

continuous

database-cursor

I am making an API over HTTP that fetches many rows from PostgreSQL with pagination. In ordinary cases, I usually implement such pagination through naive OFFET/LIMIT clause. However, there are some special requirements in this case:

A lot of rows there are so that I believe users cannot reach the end (imagine Twitter timeline).
Pages does not have to be randomly accessible but only sequentially.
API would return a URL which contains a cursor token that directs to the page of continuous chunks.
Cursor tokens have not to exist permanently but for some time.
Its ordering has frequent fluctuating (like Reddit rankings), however continuous cursors should keep their consistent ordering.

How can I achieve the mission? I am ready to change my whole database schema for it!

374

asked Oct 12 '11 21:10

minhee

2 Answers

Assuming it's only the ordering of the results that fluctuates and not the data in the rows, Fredrik's answer makes sense. However, I'd suggest the following additions:

store the id list in a postgresql table using the array type rather than in memory. Doing it in memory, unless you carefully use something like redis with auto expiry and memory limits, is setting yourself up for a DOS memory consumption attack. I imagine it would look something like this:

Click to copy
```
create table foo_paging_cursor (
  cursor_token ..., -- probably a uuid is best or timestamp (see below)
  result_ids integer[], -- or text[] if you have non-integer ids
  expiry_time TIMESTAMP
);
```
You need to decide if the cursor_token and result_ids can be shared between users to reduce your storage needs and the time needed to run the initial query per user. If they can be shared, chose a cache window, say 1 or 5 minute(s), and then upon a new request create the cache_token for that time period and then check to see if the results ids have already been calculated for that token. If not, add a new row for that token. You should probably add a lock around the check/insert code to handle concurrent requests for a new token.
Have a scheduled background job that purges old tokens/results and make sure your client code can handle any errors related to expired/invalid tokens.

Don't even consider using real db cursors for this.

Keeping the result ids in Redis lists is another way to handle this (see the LRANGE command), but be careful with expiry and memory usage if you go down that path. Your Redis key would be the cursor_token and the ids would be the members of the list.

answered Oct 19 '22 15:10

Tavis Rudd

I know absolutely nothing about PostgreSQL, but I'm a pretty decent SQL Server developer, so I'd like to take a shot at this anyway :)

How many rows/pages do you expect a user would maximally browse through per session? For instance, if you expect a user to page through a maximum of 10 pages for each session [each page containing 50 rows], you could make take that max, and setup the webservice so that when the user requests the first page, you cache 10*50 rows (or just the Id:s for the rows, depends on how much memory/simultaneous users you got).

This would certainly help speed up your webservice, in more ways than one. And it's quite easy to implement to. So:

When a user requests data from page #1. Run a query (complete with order by, join checks, etc), store all the id:s into an array (but a maximum of 500 ids). Return datarows that corresponds to id:s in the array at positions 0-9.
When the user requests page #2-10. Return datarows that corresponds to id:s in the array at posisions (page-1)*50 - (page)*50-1.

You could also bump up the numbers, an array of 500 int:s would only occupy 2K of memory, but it also depends on how fast you want your initial query/response.

I've used a similar technique on a live website, and when the user continued past page 10, I just switched to queries. I guess another solution would be to continue to expand/fill the array. (Running the query again, but excluding already included id:s).

Anyway, hope this helps!

answered Oct 19 '22 16:10

Fredrik Johansson

Related questions
                            
                                How to save a string containing single quotes to a text column in PostgreSQL
                            
                                Calculate tuple size
                            
                                Batch Entry 0 insert into PGSQL-Call getNextException to see the cause
                            
                                How to calculate next birthday given a date of birth?
                            
                                Postgres Sequence jumping by 32
                            
                                Django Cache (Redis) auto refresh when data is changed
                            
                                Best way to access and close a postgres database using python dataset
                            
                                Postgres wraps subquery results in parentheses
                            
                                How to use postgreSQL in Laravel Homestead
                            
                                How to run Postgres tests in parallel with NodeJS, Jest and Knex?
                            
                                Getting error from using postgres function "lower" in JPA Named Queries
                            
                                which one is effecient, join queries using sql, or merge queries using pandas?
                            
                                Add timezone to timestamp in PostgreSQL (not convert to this timezone)
                            
                                Django Full Text Search Optimization - Postgres
                            
                                Postgres : ShareLock Deadlock on transaction
                            
                                SQL Find pairs of rows with next best timestamp match
                            
                                count number of rows to be affected before update in trigger
                            
                                PostgreSQL performance on EC2/EBS
                            
                                PHP-MySQL developer transitioning to PostgreSQL. What do I need to know?
                            
                                importing osm file into a postgres/postgis database

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With