Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python psycopg2 cursors

From psycopg2 documentation:

When a database query is executed, the Psycopg cursor usually fetches all the records returned by the backend, transferring them to the client process. If the query returned an huge amount of data, a proportionally large amount of memory will be allocated by the client. If the dataset is too large to be practically handled on the client side, it is possible to create a server side cursor.

I would like to query a table with possibly thousands of rows and do some action for each one. Will normal cursors actually bring the entire data set on the client? That doesn't sound very reasonable. The code is something along the lines of:

conn = psycopg2.connect(url)
cursor = conn.cursor()
cursor.execute(sql)
for row in cursor:
    do some stuff
cursor.close()

I would expect this to be a streaming operation. And a second question is regarding the scope of cursors. Inside my loop I would like to do an update of another table. Do I need to open a new cursor and close every time? Each item update should be in its own transaction as I might need to do a rollback.

for row in cursor:
    anotherCursor = anotherConn.cursor()
    anotherCursor.execute(update)
    if somecondition:
        anotherConn.commit()
    else:
        anotherConn.rollback
cursor.close()

======== EDIT: MY ANSWER TO FIRST PART ========

Ok, I will try to answer the first part of my question. The normal cursors actually bring the entire data set as soon as you call execute, before even starting to iterate the result set. You can verify that by checking the memory footprint of the process at each step. But the need for a server side cursor is actually due to postgres server and not the client, and is documented here: http://www.postgresql.org/docs/9.3/static/sql-declare.html

Now, this is not immediately apparent from the documentation, but such cursors can actually be temporarily created for the duration of the transaction. There is no need to explicitly create a function that returns a refcursor in the database, with the specific SLQ statement, etc. With psycopg2 you only need to give a name while obtaining the cursor and a temporary cursor will be created for that transaction. So instead of:

 cursor = conn.cursor()

you just need to to:

 cursor = conn.cursor('mycursor')

That's it and it works. I assume the same thing is done under the covers when using JDBC, when setting fetchSize. It's just a bit more transparent. See docs here: https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor

You can test that this works by querying the pg_cursors view inside the same transaction. The server side cursor appears after obtaining the client side cursor and disappears after closing the client side cursor. So bottom line: I'm happy to do that change to my code, but I must say this was a big gotcha for someone not that experienced with postgres.

like image 891
Nazaret K. Avatar asked May 24 '15 19:05

Nazaret K.


People also ask

What is cursor in psycopg2?

The Cursor class of the psycopg library provide methods to execute the PostgreSQL commands in the database using python code. Using the methods of it you can execute SQL statements, fetch data from the result sets, call procedures. You can create Cursor object using the cursor() method of the Connection object/class.

What does cursor () do in Python?

Practical Data Science using Python Cursor class is an instance using which you can invoke methods that execute SQLite statements, fetch data from the result sets of the queries. You can create Cursor object using the cursor() method of the Connection object/class.

How do I create a cursor in PostgreSQL?

One way to create a cursor variable is just to declare it as a variable of type refcursor . Another way is to use the cursor declaration syntax, which in general is: name [ [ NO ] SCROLL ] CURSOR [ ( arguments ) ] FOR query ; ( FOR can be replaced by IS for Oracle compatibility.)


1 Answers

Actually, you have already answered the question ;).

  1. Yes you should use server side cursor to get records streamed http://initd.org/psycopg/docs/usage.html#server-side-cursors

From docs:

CREATE FUNCTION reffunc(refcursor) RETURNS refcursor AS $$
BEGIN
    OPEN $1 FOR SELECT col FROM test;
    RETURN $1;
END;
$$ LANGUAGE plpgsql;

And in code:

cur1 = conn.cursor()
cur1.callproc('reffunc', ['curname'])

cur2 = conn.cursor('curname')
for record in cur2:     # or cur2.fetchone, fetchmany...
    # do something with record
    pass
  1. Yes you should open new cursor, if you wanna get rows with server side cursor.
like image 164
kwarunek Avatar answered Sep 16 '22 14:09

kwarunek