Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas to_sql insert ignore

Tags:

python

pandas

I want to incrementally keep adding data frame rows into MySQL DB avoiding any duplicate entries to go in MySQL.

I am currently doing this by looping through every row using df.apply()and calling MySQL insert ignore(duplicates) to add unique rows into MySQL database. But using pandas.apply is very slow(45 secs for 10k rows). I want to achieve this using pandas.to_sql() method which takes 0.5 secs to push 10k entries into DB but doesn't support ignore duplicate in append mode. Is there an efficient and fast way to achieve this?

Input CSV

Date,Open,High,Low,Close,Volume
1994-01-03,111.7,112.75,111.55,112.65,0
1994-01-04,112.68,113.47,112.2,112.65,0
1994-01-05,112.6,113.63,112.3,113.0,0
1994-01-06,113.02,113.43,112.25,112.62,0
1994-01-07,112.55,112.8,111.5,111.88,0
1994-01-10,111.8,112.43,111.35,112.25,0
1994-01-11,112.18,112.88,112.05,112.4,0
1994-01-12,112.38,112.82,111.95,112.28,0

code

nifty_data.to_sql(name='eod_data', con=engine, if_exists = 'append', index=False) # option-1 
nifty_data.apply(addToDb, axis=1) # option-2 

def addToDb(row):
    sql = "INSERT IGNORE INTO eod_data (date, open, high, low, close, volume) VALUES (%s,%s,%s,%s,%s,%s)"
    val = (row['Date'], row['Open'], row['High'], row['Low'], row['Close'], row['Volume'])
    mycursor.execute(sql, val)
    mydb.commit()`

option-1: doesn't allow insert ignore (~0.5 secs)

option-2: has to loop through and is very slow (~45 secs)

like image 369
Aditya Mertia Avatar asked Feb 04 '23 20:02

Aditya Mertia


1 Answers

You can create a temporary table:

nifty_data.to_sql(name='temporary_table', con=engine, if_exists = 'append', index=False)

And then run an INSERT IGNORE statement from that:

with engine.begin() as cnx:
    insert_sql = 'INSERT IGNORE INTO eod_data (SELECT * FROM temporary_table)'
    cnx.execute(insert_sql)

just make sure the column orders are the same or you might have to manually declare them.

like image 94
pedrogfp Avatar answered Feb 06 '23 15:02

pedrogfp