Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python PostgreSQL COPY command used to INSERT or UPDATE (not just INSERT)

I'm trying to use the COPY command to insert data from a file into PGSQL via Python. This works incredibly well when the target table is empty or I ensure ahead of time there will be no unique key collisions:

cmd = ("COPY %s (%s) FROM STDIN WITH (FORMAT CSV, NULL '_|NULL|_')" %
               (tableName, colStr))
cursor.copy_expert(cmd, io)

I'd prefer however to be able to perform this COPY command without first emptying the table. Is there any way to do an 'INSERT or UPDATE' type operation with SQL COPY?

like image 815
slumtrimpet Avatar asked Oct 25 '17 13:10

slumtrimpet


People also ask

What does the Copy command in PostgreSQL do?

COPY moves data between PostgreSQL tables and standard file-system files. COPY TO copies the contents of a table to a file, while COPY FROM copies data from a file to a table (appending the data to whatever is in the table already). COPY TO can also copy the results of a SELECT query.

Why COPY is faster than insert?

The COPY FROM command operates much faster than a normal INSERT command because the data is read as a single transaction directly to the target table.

Does Postgres COPY overwrite?

If you COPY data into a table already containing data, the new data will be appended. If you COPY TO a file already containing data, the existing data will be overwritten.

What is the difference between copy and copy from in PostgreSQL?

) ENCODING ' encoding_name ' COPY moves data between PostgreSQL tables and standard file-system files. COPY TO copies the contents of a table to a file, while COPY FROM copies data from a file to a table (appending the data to whatever is in the table already). COPY TO can also copy the results of a SELECT query.

How to update a PostgreSQL table in Python?

Define the UPDATE statement query to update the data of the PostgreSQL table. Execute the UPDATE query using a cursor.execute () Close the cursor and database connection. Now, Let see the example to update a single row of the database table. Verify the result of the above update operation by Selecting data from the PostgreSQL table using Python.

How do I insert data into a PostgreSQL table from another file?

A useful technique within PostgreSQL is to use the COPY command to insert values directly into tables from external files. Files used for input by COPY must either be in standard ASCII text format, whose fields are delimited by a uniform symbol, or in PostgreSQL’s binary table format. Common delimiters for ASCII files are tabs and commas.

How do I insert a column in PostgreSQL using Python?

Install psycopg2 using pip. Second, Establish a PostgreSQL database connection in Python. Next, Define the Insert query. All you need to know is the table’s column details. Execute the INSERT query using cursor.execute (). In return, you will get the number of rows affected.


2 Answers

Not directly through the copy command.

What you can do however is create a temporary table, populate that table with the copy command, and then do your insert and update from that.

-- Clone table stucture of target table
create temporary table __copy as (select * from my_schema.my_table limit 0);


-- Copy command goes here...


-- Update existing records
update
    my_schema.my_table
set
    column_2 = __copy.column_2
from
    __copy
where
    my_table.column_1 = __copy.column_1;


-- Insert new records
insert into my_schema.my_table (
    column_1,
    column_2
) (
    select
        column_1,
        column_2
    from
        __copy
        left join my_schema.my_table using(column_1)
    where
        my_table is null
);

You might consider creating an index on __copy after populating it with data to speed the update query up.

like image 67
Scoots Avatar answered Sep 20 '22 09:09

Scoots


Consider using a temp table as staging table that receives csv file data. Then, run an append into final table using Postgres' CONFLICT (colname) DO UPDATE .... Available in version 9.3+. See docs. Do note that the special excluded table is used to reference values originally proposed for insertion.

Also, assuming you use pyscopg2, consider using sql.Identifier() to safely bind identifiers like table or column names. However, you would need to decompose colStr to wrap individual items:

from psycopg2 import sql
...
cursor.execute("DELETE FROM tempTable")
conn.commit()

cmd = sql.SQL("COPY {0} ({1}) FROM STDIN WITH (FORMAT CSV, NULL '_|NULL|_'))")\
              .format(sql.Identifier(temptableName),
                      sql.SQL(', ').join([sql.Identifier('col1'), 
                                          sql.Identifier('col2'), 
                                          sql.Identifier('col3')]))
cursor.copy_expert(cmd, io)

sql = "INSERT INTO finalTable (id_column, Col1, Col2, Col3)" + \
      " SELECT id_column, Col1, Col2, Col3 FROM tempTable t" + \
      " ON CONFLICT (id_column) DO UPDATE SET Col1 = EXCLUDED.Col1," + \
      "                                       Col2 = EXCLUDED.Col2," + \
      "                                       Col3 = EXCLUDED.Col3 ...;"

cursor.execute(sql)
conn.commit()
like image 20
Parfait Avatar answered Sep 20 '22 09:09

Parfait