Postgresql ON CONFLICT in sqlalchemy

Tags:

I've read quite a few resources (ao. 1, 2) but I'm unable to get Postgresql's ON CONFLICT IGNORE behaviour working in sqlalchemy.

I've used this accepted answer as a basis, but it gives

SAWarning: Can't validate argument 'append_string'; can't locate any SQLAlchemy dialect named 'append'

I've tried adding the postgresql dialect to the @compile clause, renaming my object, but it doesn't work. I also tried to use the str(insert())+ " ON CONFILCT IGNORE" without results. (not surprising btw)

How can I get the On CONFLICT IGNORE to get added to my inserts? I like the proposed solution, as I can see myself not wanting the IGNORE behaviour on each INSERT

ps. using python 2.7 (don't mind upgrading to 3.4/3.5), latest sqlalchemy (1.x)

787

asked Oct 23 '15 16:10

puredevotion

2 Answers

This works with Postgresql 9.5:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert

@compiles(Insert)
def prefix_inserts(insert, compiler, **kw):
    return compiler.visit_insert(insert, **kw) + " ON CONFLICT DO NOTHING"

I use it for bulk_insert_mappings. It does however not make ON CONFLICT DO NOTHING optional

answered Oct 02 '22 20:10

Niklas B

Using Postgres 9.6.1, sqlachemy 1.1.4, and psycopg2 2.6.2:

Convert your data structure to a dictionary. From Pandas it is

import pandas
from sqlalchemy import MetaData
from sqlalchemy.dialects.postgresql import insert
import psycopg2

# The dictionary should include all the values including index values
insrt_vals = df.to_dict(orient='records')

Connect to database through sqlalchemy . Instead try psycog2 driver underneath and the native COPY function, which bypasses all the postgres indexing.

csv_data = os.path.realpath('test.csv')
con = psycopg2.connect(database = 'db01', user = 'postgres')
cur = con.cursor()
cur.execute("\copy stamm_data from '%s' DELIMITER ';' csv header" % csv_data)
con.commit()

Execute

results = engine.execute(do_nothing_stmt)
# Get number of rows inserted
rowcount = results.rowcount

Warning:

This method does not work with NaTs out of the box.

Everything together

tst_df = pd.DataFrame({'colA':['a','b','c','a','z', 'q'],
              'colB': pd.date_range(end=datetime.datetime.now() , periods=6),
              'colC' : ['a1','b2','c3','a4','z5', 'q6']})


insrt_vals = tst_df.to_dict(orient='records')
engine =      sqlalchemy.create_engine("postgresql://user:password@localhost/postgres")
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
table = meta.tables['tstbl']
insrt_stmnt = insert(table).values(insrt_vals)

do_nothing_stmt  = insrt_stmnt.on_conflict_do_nothing(index_elements=['colA','colB'])
results = engine.execute(do_nothing_stmt)

Instead of step 2 and 3 , using psycog2 driver with the copy command in postgres is faster for larger files (approaching a gig) because it sets all the table indexing off.

csv_data = os.path.realpath('test.csv')

199

answered Oct 02 '22 19:10

Itay Livni

Related questions
                            
                                Why am I getting the error: command 'llvm-gcc-4.2' failed with exit status 1
                            
                                Django: Grab a set of objects from ID list (and sort by timestamp)
                            
                                Negative look ahead python regex
                            
                                How to use an image for the background in tkinter?
                            
                                Splitting path strings into drive, path and file name parts
                            
                                does ndb have a list property
                            
                                How to pass multiple values for a single URL parameter?
                            
                                How to run SVN commands from a python script?
                            
                                Python flask jinja image file not found
                            
                                List of objects with a unique attribute
                            
                                One-step initialization of defaultdict that appends to list?
                            
                                pylab histogram get rid of nan
                            
                                Sort dict by highest value? [duplicate]
                            
                                How to change legend fontname in matplotlib
                            
                                Python comparison operators chaining/grouping left to right?
                            
                                Python multiprocessing with pathos
                            
                                Check whether element is clickable in selenium
                            
                                TemplateSyntaxError: expected token ':', got '}'
                            
                                Identifying consecutive NaNs with Pandas
                            
                                How to set first N elements of array to zero?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Postgresql ON CONFLICT in sqlalchemy

Tags:

python

sql

postgresql

sqlalchemy