Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy IntegrityError and bulk data imports

I am inserting several 10k records into a database with REF integrity rules. Some of the rows of data are unfortunately duplicates (in that they already exist in the database). It would be too expensive to check the existence of every row in the database before inserting it so I intend to proceed by handling IntegrityError exceptions thrown by SQLAlchemy, logging the error and then continuing.

My code will look something like this:

# establish connection to db etc.

tbl = obtain_binding_to_sqlalchemy_orm()
datarows = load_rows_to_import()

try:
    conn.execute(tbl.insert(), datarows)
except IntegrityError as ie:
    # eat error and keep going
except Exception as e:
    # do something else

The (implicit) assumption I am making above is that SQLAlchemy is not rolling the multiple inserts into ONE transaction. If my assumption is wrong then it means that if an IntegrityError occurs, the rest of the insert is aborted. Can anyone confirm if the pseudocode "pattern" above will work as expected - or will I end up losing data as a result of thrown IntegrityError exceptions?

Also, if anyone has a better idea of doing this, I will be interested to hear it.

like image 293
Homunculus Reticulli Avatar asked May 14 '12 15:05

Homunculus Reticulli


1 Answers

it may work like this, if you didn't start any transaction before, as in this case sqlalchemy's autocommit feature will kick in. but you should explicitly set as described in the link.

like image 165
mata Avatar answered Sep 30 '22 18:09

mata