Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy - performing a bulk upsert (if exists, update, else insert) in postgresql

I am trying to write a bulk upsert in python using the SQLAlchemy module (not in SQL!).

I am getting the following error on a SQLAlchemy add:

sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates unique constraint "posts_pkey" DETAIL:  Key (id)=(TEST1234) already exists. 

I have a table called posts with a primary key on the id column.

In this example, I already have a row in the db with id=TEST1234. When I attempt to db.session.add() a new posts object with the id set to TEST1234, I get the error above. I was under the impression that if the primary key already exists, the record would get updated.

How can I upsert with Flask-SQLAlchemy based on primary key alone? Is there a simple solution?

If there is not, I can always check for and delete any record with a matching id, and then insert the new record, but that seems expensive for my situation, where I do not expect many updates.

like image 546
mgoldwasser Avatar asked Sep 21 '14 02:09

mgoldwasser


2 Answers

There is an upsert-esque operation in SQLAlchemy:

db.session.merge()

After I found this command, I was able to perform upserts, but it is worth mentioning that this operation is slow for a bulk "upsert".

The alternative is to get a list of the primary keys you would like to upsert, and query the database for any matching ids:

# Imagine that post1, post5, and post1000 are posts objects with ids 1, 5 and 1000 respectively # The goal is to "upsert" these posts. # we initialize a dict which maps id to the post object  my_new_posts = {1: post1, 5: post5, 1000: post1000}   for each in posts.query.filter(posts.id.in_(my_new_posts.keys())).all():     # Only merge those posts which already exist in the database     db.session.merge(my_new_posts.pop(each.id))  # Only add those posts which did not exist in the database  db.session.add_all(my_new_posts.values())  # Now we commit our modifications (merges) and inserts (adds) to the database! db.session.commit() 
like image 188
mgoldwasser Avatar answered Sep 18 '22 11:09

mgoldwasser


You can leverage the on_conflict_do_update variant. A simple example would be the following:

from sqlalchemy.dialects.postgresql import insert  class Post(Base):     """     A simple class for demonstration     """      id = Column(Integer, primary_key=True)     title = Column(Unicode)  # Prepare all the values that should be "upserted" to the DB values = [     {"id": 1, "title": "mytitle 1"},     {"id": 2, "title": "mytitle 2"},     {"id": 3, "title": "mytitle 3"},     {"id": 4, "title": "mytitle 4"}, ]  stmt = insert(Post).values(values) stmt = stmt.on_conflict_do_update(     # Let's use the constraint name which was visible in the original posts error msg     constraint="post_pkey",      # The columns that should be updated on conflict     set_={         "title": stmt.excluded.title     } ) session.execute(stmt) 

See the Postgres docs for more details about ON CONFLICT DO UPDATE.

See the SQLAlchemy docs for more details about on_conflict_do_update.

Side-Note on duplicated column names

The above code uses the column names as dict keys both in the values list and the argument to set_. If the column-name is changed in the class-definition this needs to be changed everywhere or it will break. This can be avoided by accessing the column definitions, making the code a bit uglier, but more robust:

coldefs = Post.__table__.c  values = [     {coldefs.id.name: 1, coldefs.title.name: "mytitlte 1"},     ... ]  stmt = stmt.on_conflict_do_update(     ...     set_={         coldefs.title.name: stmt.excluded.title         ...     } ) 
like image 36
exhuma Avatar answered Sep 20 '22 11:09

exhuma