I want to incrementally keep adding data frame rows into MySQL DB avoiding any duplicate entries to go in MySQL.
I am currently doing this by looping through every row using df.apply()and calling MySQL insert ignore(duplicates) to add unique rows into MySQL database. But using pandas.apply is very slow(45 secs for 10k rows). I want to achieve this using pandas.to_sql() method which takes 0.5 secs to push 10k entries into DB but doesn't support ignore duplicate in append mode. Is there an efficient and fast way to achieve this?
Input CSV
Date,Open,High,Low,Close,Volume
1994-01-03,111.7,112.75,111.55,112.65,0
1994-01-04,112.68,113.47,112.2,112.65,0
1994-01-05,112.6,113.63,112.3,113.0,0
1994-01-06,113.02,113.43,112.25,112.62,0
1994-01-07,112.55,112.8,111.5,111.88,0
1994-01-10,111.8,112.43,111.35,112.25,0
1994-01-11,112.18,112.88,112.05,112.4,0
1994-01-12,112.38,112.82,111.95,112.28,0
code
nifty_data.to_sql(name='eod_data', con=engine, if_exists = 'append', index=False) # option-1
nifty_data.apply(addToDb, axis=1) # option-2
def addToDb(row):
sql = "INSERT IGNORE INTO eod_data (date, open, high, low, close, volume) VALUES (%s,%s,%s,%s,%s,%s)"
val = (row['Date'], row['Open'], row['High'], row['Low'], row['Close'], row['Volume'])
mycursor.execute(sql, val)
mydb.commit()`
option-1: doesn't allow insert ignore (~0.5 secs)
option-2: has to loop through and is very slow (~45 secs)
You can create a temporary table:
nifty_data.to_sql(name='temporary_table', con=engine, if_exists = 'append', index=False)
And then run an INSERT IGNORE statement from that:
with engine.begin() as cnx:
insert_sql = 'INSERT IGNORE INTO eod_data (SELECT * FROM temporary_table)'
cnx.execute(insert_sql)
just make sure the column orders are the same or you might have to manually declare them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With