Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with duplicate primary keys on insert in SQLAlchemy (declarative style)

My application is using a scoped session and the declarative style of SQLALchemy. It's a web-app and a lot of the DB insertions are executed by Celery, a task scheduler.

Typically, when deciding to insert an object, my code might do something along the following lines:

from schema import Session from schema.models import Bike  pk = 123 # primary key bike = Session.query(Bike).filter_by(bike_id=pk).first() if not bike: # no bike in DB     new_bike = Bike(pk, "shiny", "bike")     Session.add(new_bike)     Session.commit() 

The issue here is that because a lot of this is done by asynchronous workers, it's possible for one working to be halfway though inserting a Bike with id=123, while another one is checking for its existence. In this case the second worker will try and insert a row with the same primary key, and SQLAlchemy will raise an IntegrityError.

I can't for the life of me find a nice way to deal with this issue apart from swapping out Session.commit() for:

'''schema/__init__.py''' from sqlalchemy.orm import scoped_session, sessionmaker Session = scoped_session(sessionmaker())  def commit(ignore=False):     try:         Session.commit()     except IntegrityError as e:         reason = e.message         logger.warning(reason)          if not ignore:             raise e          if "Duplicate entry" in reason:             logger.info("%s already in table." % e.params[0])             Session.rollback() 

And then everywhere I have Session.commit I now have schema.commit(ignore=True) where I don't mind that the row is not inserted again.

To me this seems very brittle because of the string checking. Just as an FYI, when an IntegrityError is raised it looks like this:

(IntegrityError) (1062, "Duplicate entry '123' for key 'PRIMARY'") 

So of course is the primary key I was inserting was something like Duplicate entry is a cool thing then I suppose I could miss IntegrityError's which weren't actually because of duplicate primary keys.

Are there any better approaches, which maintain the clean SQLAlchemy approach I'm using (as opposed to starting to write out statements in strings etc. . .)

Db is MySQL (though for unit testing I like to use SQLite, and wouldn't want to hinder that ability with any new approaches).

Cheers!

like image 713
Edwardr Avatar asked Apr 25 '12 19:04

Edwardr


1 Answers

If you use session.merge(bike) instead of session.add(bike), then you will not generate primary key errors. The bike will be retrieved and updated or created as needed.

like image 119
sirdodger Avatar answered Sep 18 '22 12:09

sirdodger