Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert into postgreSQL table from pandas with "on conflict" update

I have a pandas DataFrame that I need to store into the database. Here's my current line of code for inserting:

df.to_sql(table,con=engine,if_exists='append',index_label=index_col)

This works fine if none of the rows in df exist in my table. If a row already exists, I get this error:

sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key
value violates unique constraint "mypk"
DETAIL:  Key (id)=(42) already exists.
 [SQL: 'INSERT INTO mytable (id, owner,...) VALUES (%(id)s, %(owner)s,...']
 [parameters:...] (Background on this error at: http://sqlalche.me/e/gkpj)

and nothing is inserted.

PostgreSQL has optional ON CONFLICT clause, which could be used to UPDATE the existing table rows. I read entire pandas.DataFrame.to_sql manual page and I couldn't find any way to use ON CONFLICT within DataFrame.to_sql() function.

I have considered spliting my DataFrame in two based on what's already in the db table. So now I have two DataFrames, insert_rows and update_rows, and I can safely execute

insert_rows.to_sql(table, con=engine, if_exists='append', index_label=index_col)

But then, there seems to be no UPDATE equivalent to DataFrame.to_sql(). So how do I update the table using DataFrame update_rows?

like image 819
Granny Aching Avatar asked Jan 31 '26 20:01

Granny Aching


1 Answers

I know it's an old thread, but I ran into the same issue and this thread showed up in Google. None of the answers is really satisfying yet, so I here's what I came up with:

My solution is pretty similar to zdgriffith's answer, but much more performant as there's no need to iterate over data_iter:

def postgres_upsert(table, conn, keys, data_iter):
    from sqlalchemy.dialects.postgresql import insert

    data = [dict(zip(keys, row)) for row in data_iter]

    insert_statement = insert(table.table).values(data)
    upsert_statement = insert_statement.on_conflict_do_update(
        constraint=f"{table.table.name}_pkey",
        set_={c.key: c for c in insert_statement.excluded},
    )
    conn.execute(upsert_statement)

Now you can use this custom upsert method in pandas' to_sql method like zdgriffith showed.

Please note that my upsert function uses the primary key constraint of the table. You can target another constraint by changing the constraint argument of .on_conflict_do_update.

This SO answer on a related thread explains the use of .excluded a bit more: https://stackoverflow.com/a/51935542/7066758

like image 59
SaturnFromTitan Avatar answered Feb 02 '26 10:02

SaturnFromTitan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!