Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Continue loading after IntegrityError

Tags:

python

sql

sqlite

In python, I am populating a SQLITE data base using the importmany, so I can import tens of thousands of rows of data at once. My data is contained as a list of tuples. I had my database set up with the primary keys where I wanted them.

Problem I ran into was primary key errors would throw up an IntegrityError. If I handle the exception my script stops importing at the primary key conflict.
try:

try:
    self.curs.executemany("INSERT into towers values (NULL,?,?,?,?)",self.insertList)
except IntegrityError:
    print "Primary key error"
conn.commit()


So my questions are, in python using importmany can I:

1. Capture the values that violate the primary key?
2. Continue loading data after I get my primary key errors.

I get why it doesnt continue to load, because after the exception I commit the data to the database. I dont know how to continue where I left off however.

Unforutnley I cannot copy and paste all the code on this network, any help would be greatly appreciated. Right now I have no PKs set as a work around...

like image 203
dan Avatar asked Jul 09 '10 11:07

dan


1 Answers

To answer (2) first, if you want to continue loading after you get an error, it's a simple fix on the SQL side:

INSERT OR IGNORE INTO towers VALUES (NULL,?,?,?,?)

This will successfully insert any rows that don't have any violations, and gracefully ignore the conflicts. Please do note however that the IGNORE clause will still fail on Foreign Key violations.

Another option for a conflict resolution clause in your case is: INSERT OR REPLACE INTO .... I strongly recommend the SQLite docs for more information on conflicts and conflict resolution.

As far as I know you cannot do both (1) and (2) simultaneously in an efficient way. You could possibly create a trigger to fire before insertions that can capture conflicting rows but this will impose a lot of unnecessary overhead on all of your insertions. (Someone please let me know if you can do this in a smarter way.) Therefore I would recommend you consider whether you truly need to capture the values of the conflicting rows or whether a redesign of your schema is required, if possible/applicable.

like image 65
yanniskatsaros Avatar answered Sep 30 '22 09:09

yanniskatsaros