Speeding up Pandas to_sql()?

Question

I have a 1,000,000 x 50 Pandas DataFrame that I am currently writing to a SQL table using:

df.to_sql('my_table', con, index=False)

It takes an incredibly long time. I've seen various explanations about how to speed up this process online, but none of them seem to work for MSSQL.

If I try the method in:

Bulk Insert A Pandas DataFrame Using SQLAlchemy

then I get a no attribute copy_from error.
If I try the multithreading method from:

http://techyoubaji.blogspot.com/2015/10/speed-up-pandas-tosql-with.html

then I get a QueuePool limit of size 5 overflow 10 reach, connection timed out error.

Is there any easy way to speed up to_sql() to an MSSQL table? Either via BULK COPY or some other method, but entirely from within Python code?

Babu Arunachalam · Accepted Answer

I've used ctds to do a bulk insert that's a lot faster with SQL server. In example below, df is the pandas DataFrame. The column sequence in the DataFrame is identical to the schema for mydb.

import ctds

conn = ctds.connect('server', user='user', password='password', database='mydb')
conn.bulk_insert('table', (df.to_records(index=False).tolist()))

Speeding up Pandas to_sql()?

Tags:

performance

python

import

sql

pandas

user1566200

1 Answers

Babu Arunachalam

Recent Activity

Donate For Us

Speeding up Pandas to_sql()?

Tags:

performance

python

import

sql

pandas

user1566200

1 Answers

Babu Arunachalam

Related questions

Recent Activity

Donate For Us