Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up Pandas to_sql()?

I have a 1,000,000 x 50 Pandas DataFrame that I am currently writing to a SQL table using:

df.to_sql('my_table', con, index=False)

It takes an incredibly long time. I've seen various explanations about how to speed up this process online, but none of them seem to work for MSSQL.

  1. If I try the method in:

    Bulk Insert A Pandas DataFrame Using SQLAlchemy

    then I get a no attribute copy_from error.

  2. If I try the multithreading method from:

    http://techyoubaji.blogspot.com/2015/10/speed-up-pandas-tosql-with.html

    then I get a QueuePool limit of size 5 overflow 10 reach, connection timed out error.

Is there any easy way to speed up to_sql() to an MSSQL table? Either via BULK COPY or some other method, but entirely from within Python code?

like image 733
user1566200 Avatar asked Jan 09 '17 18:01

user1566200


1 Answers

I've used ctds to do a bulk insert that's a lot faster with SQL server. In example below, df is the pandas DataFrame. The column sequence in the DataFrame is identical to the schema for mydb.

import ctds

conn = ctds.connect('server', user='user', password='password', database='mydb')
conn.bulk_insert('table', (df.to_records(index=False).tolist()))
like image 107
Babu Arunachalam Avatar answered Sep 21 '22 04:09

Babu Arunachalam