Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql ON CONFLICT in sqlalchemy

I've read quite a few resources (ao. 1, 2) but I'm unable to get Postgresql's ON CONFLICT IGNORE behaviour working in sqlalchemy.

I've used this accepted answer as a basis, but it gives

SAWarning: Can't validate argument 'append_string'; can't locate any SQLAlchemy dialect named 'append'

I've tried adding the postgresql dialect to the @compile clause, renaming my object, but it doesn't work. I also tried to use the str(insert())+ " ON CONFILCT IGNORE" without results. (not surprising btw)

How can I get the On CONFLICT IGNORE to get added to my inserts? I like the proposed solution, as I can see myself not wanting the IGNORE behaviour on each INSERT

ps. using python 2.7 (don't mind upgrading to 3.4/3.5), latest sqlalchemy (1.x)

like image 787
puredevotion Avatar asked Oct 23 '15 16:10

puredevotion


People also ask

Does SQLAlchemy work with PostgreSQL?

PostgreSQL supports sequences, and SQLAlchemy uses these as the default means of creating new primary key values for integer-based primary key columns.

Is SQLAlchemy good for ETL?

One of the key aspects of any data science workflow is the sourcing, cleaning, and storing of raw data in a form that can be used upstream. This process is commonly referred to as “Extract-Transform-Load,” or ETL for short.

Should I use SQLAlchemy core or ORM?

If you want to view your data in a more schema-centric view (as used in SQL), use Core. If you have data for which business objects are not needed, use Core. If you view your data as business objects, use ORM. If you are building a quick prototype, use ORM.

Is SQLAlchemy worth using?

SQLAlchemy is the ORM of choice for working with relational databases in python. The reason why SQLAlchemy is so popular is because it is very simple to implement, helps you develop your code quicker and doesn't require knowledge of SQL to get started.


2 Answers

This works with Postgresql 9.5:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert

@compiles(Insert)
def prefix_inserts(insert, compiler, **kw):
    return compiler.visit_insert(insert, **kw) + " ON CONFLICT DO NOTHING"

I use it for bulk_insert_mappings. It does however not make ON CONFLICT DO NOTHING optional

like image 27
Niklas B Avatar answered Oct 02 '22 20:10

Niklas B


Using Postgres 9.6.1, sqlachemy 1.1.4, and psycopg2 2.6.2:

  1. Convert your data structure to a dictionary. From Pandas it is

    import pandas
    from sqlalchemy import MetaData
    from sqlalchemy.dialects.postgresql import insert
    import psycopg2
    
    # The dictionary should include all the values including index values
    insrt_vals = df.to_dict(orient='records')
    
  2. Connect to database through sqlalchemy . Instead try psycog2 driver underneath and the native COPY function, which bypasses all the postgres indexing.

    csv_data = os.path.realpath('test.csv')
    con = psycopg2.connect(database = 'db01', user = 'postgres')
    cur = con.cursor()
    cur.execute("\copy stamm_data from '%s' DELIMITER ';' csv header" % csv_data)
    con.commit()
    
  3. Execute

    results = engine.execute(do_nothing_stmt)
    # Get number of rows inserted
    rowcount = results.rowcount
    

Warning:

This method does not work with NaTs out of the box.

Everything together

tst_df = pd.DataFrame({'colA':['a','b','c','a','z', 'q'],
              'colB': pd.date_range(end=datetime.datetime.now() , periods=6),
              'colC' : ['a1','b2','c3','a4','z5', 'q6']})


insrt_vals = tst_df.to_dict(orient='records')
engine =      sqlalchemy.create_engine("postgresql://user:password@localhost/postgres")
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
table = meta.tables['tstbl']
insrt_stmnt = insert(table).values(insrt_vals)

do_nothing_stmt  = insrt_stmnt.on_conflict_do_nothing(index_elements=['colA','colB'])
results = engine.execute(do_nothing_stmt)

Instead of step 2 and 3 , using psycog2 driver with the copy command in postgres is faster for larger files (approaching a gig) because it sets all the table indexing off.

csv_data = os.path.realpath('test.csv')
like image 199
Itay Livni Avatar answered Oct 02 '22 19:10

Itay Livni