Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using "Cursors" for paging in PostgreSQL [duplicate]

Possible Duplicate:
How to provide an API client with 1,000,000 database results?

Wondering of the use of Cursors is a good way to implement "paging" using PostgreSQL.

The use case is that we have upwards 100,000 rows that we'd like to make available to our API clients. We thought a good way to make this happen would be to allow the client to request the information in batches ( pages ). The client could request 100 rows at a time. We would return the 100 rows as well as a cursor, then when the client was ready, they could request the next 100 rows using the cursor that we sent to them.

However, I'm a little hazy on how cursors work and exactly how and when cursors should be used:

  • Do the cursors require that a database connection be left open?
  • Do the cursors run inside a transaction, locking resources until they are "closed"?
  • Are there any other "gotchas" that I'm not aware of?
  • Is there another, better way that this situation should be handled?

Thanks so much!

like image 585
Chris Dutrow Avatar asked Oct 30 '12 15:10

Chris Dutrow


People also ask

Why is cursor pagination faster?

Perhaps the biggest advantage of cursor pagination is its ability to handle real-time data effectively. This is because cursors do not require the data to remain static. That is to say, new items or rows can be added or removed, and each new page will still load correctly.

How do you Paginate a cursor?

To use cursor pagination instead of offset pagination, include the page[size] parameter in the request parameters. This parameter is also used to specify the number of items to return per page. Most endpoints limit this to a maximum of 100. See the API documentation for the specific resource.

Can we use cursor in PostgreSQL?

Opening Cursors. Before a cursor can be used to retrieve rows, it must be opened. (This is the equivalent action to the SQL command DECLARE CURSOR .) PL/pgSQL has three forms of the OPEN statement, two of which use unbound cursor variables while the third uses a bound cursor variable.

Which method is considered most efficient for pagination in REST API?

The Offset method is the most common way to paginate resources (with an offset on query), but it's less efficient than the Search-after method.


1 Answers

Cursors are a reasonable choice for paging in smaller intranet applications that work with large data sets, but you need to be prepared to discard them after a timeout. Users like to wander off, go to lunch, go on holiday for two weeks, etc, and leave their applications running. If it's a web-based app there's even the question of what "running" is and how to tell if the user is still around.

They are not suitable for large-scale applications with high client counts and clients that come and go near-randomly like in web-based apps or web APIs. I would not recommend using cursors in your application unless you have a fairly small client count and very high request rates ... in which case sending tiny batches of rows will be very inefficient and you should think about allowing range-requests etc instead.

Cursors have several costs. If the cursor is not WITH HOLD you must keep a transaction open. The open transaction can prevent autovacuum from doing its work properly, causing table bloat and other issues. If the cursor is declared WITH HOLD and the transaction isn't held open you have to pay the cost of materializing and storing a potentially large result set - at least, I think that's how hold cursors work. The alternative is just as bad, keeping the transaction implicitly open until the cursor is destroyed and preventing rows from being cleaned up.

Additionally, if you're using cursors you can't hand connections back to a connection pool. You'll need one connection per client. That means more backend resources are used just maintaining session state, and sets a very real upper limit on the number of clients you can handle with a cursor-based approach.

There's also the complexity and overhead of managing a stateful, cursor-based setup as compared to a stateless connection-pooling approach with limit and offset. You need to have your application expire cursors after a timeout or you face potentially unbounded resource use on the server, and you need to keep track of which connections have which cursors for which result sets for which users....

In general, despite the fact that it can be quite inefficient, LIMIT and OFFSET can be the better solution. It can often be better to search the primary key rather than using OFFSET, though.

By the way, you were looking at the documentation for cursors in PL/pgSQL. You want normal SQL-level cursors for this job.


Do the cursors require that a database connection be left open?

Yes.

Do the cursors run inside a transaction, locking resources until they are "closed"?

Yes unless they are WITH HOLD, in which case they consume other database resources.

Are there any other "gotchas" that I'm not aware of?

Yes, as the above should explain.

like image 168
Craig Ringer Avatar answered Sep 21 '22 02:09

Craig Ringer