Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does not postgresql start returning rows immediately?

The following query returns data right away:

SELECT time, value from data order by time limit 100;

Without the limit clause, it takes a long time before the server starts returning rows:

SELECT time, value from data order by time;

I observe this both by using the query tool (psql) and when querying using an API.

Questions/issues:

  • The amount of work the server has to do before starting to return rows should be the same for both select statements. Correct?
  • If so, why is there a delay in case 2?
  • Is there some fundamental RDBMS issue that I do not understand?
  • Is there a way I can make postgresql start returning result rows to the client without pause, also for case 2?
  • EDIT (see below). It looks like setFetchSize is the key to solving this. In my case I execute the query from python, using SQLAlchemy. How can I set that option for a single query (executed by session.execute)? I use the psycopg2 driver.

The column time is the primary key, BTW.

EDIT:

I believe this excerpt from the JDBC driver documentation describes the problem and hints at a solution (I still need help - see the last bullet list item above):

By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

and

Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour).

// make sure autocommit is off
conn.setAutoCommit(false);
Statement st = conn.createStatement();

// Turn use of the cursor on.
st.setFetchSize(50);
like image 838
codeape Avatar asked Jan 08 '10 12:01

codeape


1 Answers

The psycopg2 dbapi driver buffers the whole query result before returning any rows. You'll need to use server side cursor to incrementally fetch results. For SQLAlchemy see server_side_cursors in the docs and if you're using the ORM the Query.yield_per() method.

SQLAlchemy currently doesn't have an option to set that per single query, but there is a ticket with a patch for implementing that.

like image 190
Ants Aasma Avatar answered Sep 28 '22 04:09

Ants Aasma