Speed up inserts into SQL Server from pyodbc

Tags:

In python, I have a process to select data from one database (Redshift via psycopg2), then insert that data into SQL Server (via pyodbc). I chose to do a read / write rather than a read / flat file / load because the row count is around 100,000 per day. Seemed easier to simply connect and insert. However - the insert process is slow, taking several minutes.

Is there a better way to insert data into SQL Server with Pyodbc?

select_cursor.execute(output_query)

done = False
rowcount = 0

while not done:
    rows = select_cursor.fetchmany(10000)

    insert_list = []

    if rows == []:
        done = True
        break

    for row in rows:
        rowcount += 1

        insert_params = (
            row[0], 
            row[1], 
            row[2]
            )

        insert_list.append(insert_params)            

    insert_cnxn = pyodbc.connect('''Connection Information''')

    insert_cursor = insert_cnxn.cursor()

    insert_cursor.executemany("""
        INSERT INTO Destination (AccountNumber, OrderDate, Value)
        VALUES (?, ?, ?)
        """, insert_list)

    insert_cursor.commit()
    insert_cursor.close()
    insert_cnxn.close()

select_cursor.close()
select_cnxn.close()

706

asked Oct 15 '16 15:10

mikebmassey

1 Answers

UPDATE: pyodbc 4.0.19 added a Cursor#fast_executemany option that can greatly improve performance by avoiding the behaviour described below. See this answer for details.

Your code does follow proper form (aside from the few minor tweaks mentioned in the other answer), but be aware that when pyodbc performs an .executemany what it actually does is submit a separate sp_prepexec for each individual row. That is, for the code

sql = "INSERT INTO #Temp (id, txtcol) VALUES (?, ?)"
params = [(1, 'foo'), (2, 'bar'), (3, 'baz')]
crsr.executemany(sql, params)

the SQL Server actually performs the following (as confirmed by SQL Profiler)

exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',1,N'foo'
exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',2,N'bar'
exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',3,N'baz'

So, for an .executemany "batch" of 10,000 rows you would be

performing 10,000 individual inserts,
with 10,000 round-trips to the server, and
sending the identical SQL command text (INSERT INTO ...) 10,000 times.

It is possible to have pyodbc send an initial sp_prepare and then do an .executemany calling sp_execute, but the nature of .executemany is that you still would do 10,000 sp_prepexec calls, just executing sp_execute instead of INSERT INTO .... That could improve performance if the SQL statement was quite long and complex, but for a short one like the example in your question it probably wouldn't make all that much difference.

One could also get creative and build "table value constructors" as illustrated in this answer, but notice that it is only offered as a "Plan B" when native bulk insert mechanisms are not a feasible solution.

126

answered Sep 21 '22 00:09

Gord Thompson

Related questions
                            
                                Alternative for numpy.choose that allows an arbitrary or at least more than 32 arguments?
                            
                                Pandas : Delete rows based on other rows
                            
                                pytest logbook logging to file and stdout
                            
                                how to install libhdf5-dev? (without yum, rpm nor apt-get)
                            
                                Splitting a list into uneven tuples
                            
                                python import * or a list from other level
                            
                                pandas dataframe: len(df) is not equal to number of iterations in df.iterrows()
                            
                                efficient loop over numpy array
                            
                                How to modify variables in another python file?
                            
                                Merging with more than one level overlap not allowed
                            
                                assertRaises in unittest not catching Exception properly
                            
                                How to make a menu in Python navigable with arrow keys
                            
                                Why exhausted generators raise StopIteration more than once?
                            
                                Is there a way to remove proper nouns from a sentence using python?
                            
                                Create mask from skimage contour
                            
                                Pycharm: set environment variable for google service account key (json credential)
                            
                                PYSPARK : casting string to float when reading a csv file
                            
                                What's the most efficient way to find factors in a list?
                            
                                Randomly select list from list of lists in python depending on weights
                            
                                pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speed up inserts into SQL Server from pyodbc

Tags:

python

sql-server

pyodbc

mikebmassey

People also ask

1 Answers

Gord Thompson

Recent Activity

Donate For Us