pyodbc - very slow bulk insert speed

Tags:

With this table:

CREATE TABLE test_insert (
    col1 INT,
    col2 VARCHAR(10),
    col3 DATE
)

the following code takes 40 seconds to run:

import pyodbc

from datetime import date


conn = pyodbc.connect('DRIVER={SQL Server Native Client 10.0};'
    'SERVER=localhost;DATABASE=test;UID=xxx;PWD=yyy')

rows = []
row = [1, 'abc', date.today()]
for i in range(10000):
    rows.append(row)

cursor = conn.cursor()
cursor.executemany('INSERT INTO test_insert VALUES (?, ?, ?)', rows)

conn.commit()

The equivalent code with psycopg2 only takes 3 seconds. I don't think mssql is that much slower than postgresql. Any idea on how to improve the bulk insert speed when using pyodbc?

EDIT: Add some notes following ghoerz's discovery

In pyodbc, the flow of executemany is:

prepare statement
loop for each set of parameters
- bind the set of parameters
- execute

In ceODBC, the flow of executemany is:

prepare statement
bind all parameters
execute

373

asked Apr 17 '11 13:04

sayap

2 Answers

I was having a similar issue with pyODBC inserting into a SQL Server 2008 DB using executemany(). When I ran a profiler trace on the SQL side, pyODBC was creating a connection, preparing the parametrized insert statement, and executing it for one row. Then it would unprepare the statement, and close the connection. It then repeated this process for each row.

I wasn't able to find any solution in pyODBC that didn't do this. I ended up switching to ceODBC for connecting to SQL Server, and it used the parametrized statements correctly.

answered Oct 19 '22 18:10

ghoerz

Trying to insert +2M rows into MSSQL using pyodbc was taking an absurdly long amount of time compared to bulk operations in Postgres (psycopg2) and Oracle (cx_Oracle). I did not have the privileges to use the BULK INSERT operation, but was able to solve the problem with the method below.

Many solutions correctly suggested fast_executemany, however, there are some tricks to using it correctly. First, I noticed that pyodbc was committing after each row when autocommit was set to True in the connect method, therefore this must be set to False. I also observed a non-linear slow down when inserting more than ~20k rows at a time, i.e. inserting 10k rows was subsecond, but 50k was upwards of 20s. I assume that the transaction log is getting quite large and slowing the whole thing down. Therefore, you must chunk your insert and commit after each chunk. I found 5k rows per chunk delivered good performance, but this would obviously depend on many factors (the data, the machine, db config etc...).

import pyodbc

CHUNK_SIZE = 5000

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in xrange(0, len(l), n): #use xrange in python2, range in python3
        yield l[i:i + n]

mssql_conn = pyodbc.connect(driver='{ODBC Driver 17 for SQL Server}',
                            server='<SERVER,PORT>',
                            timeout=1,
                            port=<PORT>,
                            uid=<UNAME>, 
                            pwd=<PWD>,
                            TDS_Version=7.2,
                            autocommit=False) #IMPORTANT

mssql_cur = mssql_conn.cursor()
mssql_cur.fast_executemany = True #IMPORTANT

params = [tuple(x) for x in df.values]

stmt = "truncate table <THE TABLE>"
mssql_cur.execute(stmt)
mssql_conn.commit()

stmt = """
INSERT INTO <THE TABLE> (field1...fieldn) VALUES (?,...,?)
"""
for chunk in chunks(params, CHUNK_SIZE): #IMPORTANT
    mssql_cur.executemany(stmt, chunk)
    mssql_conn.commit()

answered Oct 19 '22 19:10

Dave Lyndon

Related questions
                            
                                Moving unique_ptrs from one vector to another
                            
                                How do you "nest" or "group" Test::Unit tests?
                            
                                Connect two android phones with wi-fi (without laptop or access point) and send file
                            
                                NPM registry alternative to http://registry.npmjs.org/npm [closed]
                            
                                Recommend a development environment for HTML5/Javascript? [duplicate]
                            
                                Different compile flags for same file in different targets
                            
                                asp.net mvc multilanguage urls/routing
                            
                                Authentication for a Symfony2 api (for mobile app use)
                            
                                When are files from NSCachesDirectory removed?
                            
                                How do you automate the installation of Eclipse plugins with command line?
                            
                                How do you organize your Backbone files?
                            
                                Cortex A9 NEON vs VFP usage confusion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pyodbc - very slow bulk insert speed

Tags:

sql-server

bulkinsert

pyodbc

sayap

People also ask

2 Answers

ghoerz

Dave Lyndon

Recent Activity

Donate For Us