Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a buffer to write a psycopg3 copy result through pandas

Using psycopg2, I could write large results as CSV using copy_expert and a BytesIO buffer like this with pandas:

copy_sql = "COPY (SELECT * FROM big_table) TO STDOUT CSV"

buffer = BytesIO()
cursor.copy_expert(copy_sql, buffer, size=8192)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel(self.output_file)

However, I can't figure out how to replace the buffer in copy_expert with psycopg3's new copy command. Has anyone figured out a way to do this?

like image 787
FlipperPA Avatar asked Sep 02 '25 17:09

FlipperPA


1 Answers

The key to writing a large query to a file through psycopg3 in this fashion is to use a SpooledTemporaryFile, which will limit the amount of memory usage in Python (see max_size). Then after the CSV is written to disk, convert with pandas.

from tempfile import SpooledTemporaryFile
from pandas import read_csv
from psycopg import connect

cursor = connect([connection]).cursor()
copy_sql = "COPY (SELECT * FROM stocks WHERE price > %s) TO STDOUT"
price = 100

with SpooledTemporaryFile(
    mode="wb",
    max_size=65546,
    buffering=8192,
) as tmpfile:
    with cursor.copy(copy_sql, (price,)) as copy:
        for data in copy:
            tmpfile.write(data)
    tmpfile.seek(0)

    read_csv(tmpfile, engine="c").to_excel("my_spreadsheet.xlsx")
like image 81
FlipperPA Avatar answered Sep 05 '25 05:09

FlipperPA