I am using PANDAS with a SQLAlchemy to write to MYSQL DB using DataFrame.to_sql
. I like to turn on the flag for 'append' --> df.to_sql(con=con, name='tablename', if_exists='append')
Since the program does several small writes to the tables during the day, I don't want the entire table overwritten with replace. Periodically, I get the duplicate entry error:
sqla: valuesToCalc has error: (IntegrityError) (1062, "Duplicate entry
'0-0000-00-00-00:00:00' for key 'PRIMARY'") 'INSERT INTO valuesToCalc () VALUES ()' ()
Any way to add the syntax "on duplicate key update"
to a pd.to_sql ? Do I have to stop using to_sql
and go directly with sqlAlchemy? I was hoping not to.
Not sure if you found an answer but here's a workaround that worked for me:
call the .to_sql()
on a temporary table then use a query to update the main table with the temp table. Then you can drop the temp table. So for example:
df.to_sql(con=con, name='tablename_temp', if_exists='replace')
connection = con.connect()
connection.execute(text("INSERT INTO tablename SELECT * FROM tablename_temp ON DUPLICATE KEY UPDATE tablename.field_to_update=tablename_temp.field_to_update"))
connection.execute(text('DROP TABLE tablename_temp '))
Here is what I ended up doing:
#df is a dataframe
num_rows = len(df)
#Iterate one row at a time
for i in range(num_rows):
try:
#Try inserting the row
df.iloc[i:i+1].to_sql(name="Table_Name",con = Engine_Name,if_exists = 'append',index=False)
except IntegrityError:
#Ignore duplicates
pass
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With