Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_sql with sqlAlchemy duplicate entries error in mysqldb

I am using PANDAS with a SQLAlchemy to write to MYSQL DB using DataFrame.to_sql. I like to turn on the flag for 'append' --> df.to_sql(con=con, name='tablename', if_exists='append') Since the program does several small writes to the tables during the day, I don't want the entire table overwritten with replace. Periodically, I get the duplicate entry error:

sqla: valuesToCalc has error:  (IntegrityError) (1062, "Duplicate entry 
 '0-0000-00-00-00:00:00' for key 'PRIMARY'") 'INSERT INTO valuesToCalc () VALUES ()' ()

Any way to add the syntax "on duplicate key update" to a pd.to_sql ? Do I have to stop using to_sql and go directly with sqlAlchemy? I was hoping not to.

like image 994
user3863132 Avatar asked Jul 22 '14 04:07

user3863132


2 Answers

Not sure if you found an answer but here's a workaround that worked for me:

call the .to_sql() on a temporary table then use a query to update the main table with the temp table. Then you can drop the temp table. So for example:

df.to_sql(con=con, name='tablename_temp', if_exists='replace')
connection = con.connect()
connection.execute(text("INSERT INTO tablename SELECT * FROM tablename_temp ON DUPLICATE KEY UPDATE tablename.field_to_update=tablename_temp.field_to_update"))
connection.execute(text('DROP TABLE tablename_temp '))
like image 60
Nidal Avatar answered Oct 06 '22 07:10

Nidal


Here is what I ended up doing:

    #df is a dataframe
    num_rows = len(df)
    #Iterate one row at a time
    for i in range(num_rows):
        try:
            #Try inserting the row
            df.iloc[i:i+1].to_sql(name="Table_Name",con = Engine_Name,if_exists = 'append',index=False)
        except IntegrityError:
            #Ignore duplicates
            pass
like image 38
NFern Avatar answered Oct 06 '22 09:10

NFern