I have a 1,000,000
x 50
Pandas DataFrame
that I am currently writing to a SQL table using:
df.to_sql('my_table', con, index=False)
It takes an incredibly long time. I've seen various explanations about how to speed up this process online, but none of them seem to work for MSSQL.
If I try the method in:
Bulk Insert A Pandas DataFrame Using SQLAlchemy
then I get a no attribute copy_from
error.
If I try the multithreading method from:
http://techyoubaji.blogspot.com/2015/10/speed-up-pandas-tosql-with.html
then I get a QueuePool limit of size 5 overflow 10 reach, connection timed out
error.
Is there any easy way to speed up to_sql() to an MSSQL table? Either via BULK COPY or some other method, but entirely from within Python code?
I've used ctds to do a bulk insert that's a lot faster with SQL server. In example below, df is the pandas DataFrame. The column sequence in the DataFrame is identical to the schema for mydb.
import ctds
conn = ctds.connect('server', user='user', password='password', database='mydb')
conn.bulk_insert('table', (df.to_records(index=False).tolist()))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With