How to speed up bulk insert to MS SQL Server using pyodbc

Tags:

Below is my code that I'd like some help with. I am having to run it over 1,300,000 rows meaning it takes up to 40 minutes to insert ~300,000 rows.

I figure bulk insert is the route to go to speed it up? Or is it because I'm iterating over the rows via for data in reader: portion?

Click to copy

#Opens the prepped csv file
with open (os.path.join(newpath,outfile), 'r') as f:
    #hooks csv reader to file
    reader = csv.reader(f)
    #pulls out the columns (which match the SQL table)
    columns = next(reader)
    #trims any extra spaces
    columns = [x.strip(' ') for x in columns]
    #starts SQL statement
    query = 'bulk insert into SpikeData123({0}) values ({1})'
    #puts column names in SQL query 'query'
    query = query.format(','.join(columns), ','.join('?' * len(columns)))

    print 'Query is: %s' % query
    #starts curser from cnxn (which works)
    cursor = cnxn.cursor()
    #uploads everything by row
    for data in reader:
        cursor.execute(query, data)
        cursor.commit()

I am dynamically picking my column headers on purpose (as I would like to create the most pythonic code possible).

SpikeData123 is the table name.

976

asked Apr 14 '15 22:04

TangoAlee

2 Answers

Update - May 2022: bcpandas and bcpyaz are wrappers for Microsoft's bcp utility.

Update - April 2019: As noted in the comment from @SimonLang, BULK INSERT under SQL Server 2017 and later apparently does support text qualifiers in CSV files (ref: here).

BULK INSERT will almost certainly be much faster than reading the source file row-by-row and doing a regular INSERT for each row. However, both BULK INSERT and BCP have a significant limitation regarding CSV files in that they cannot handle text qualifiers (ref: here). That is, if your CSV file does not have qualified text strings in it ...

Click to copy

1,Gord Thompson,2015-04-15
2,Bob Loblaw,2015-04-07

... then you can BULK INSERT it, but if it contains text qualifiers (because some text values contains commas) ...

Click to copy

1,"Thompson, Gord",2015-04-15
2,"Loblaw, Bob",2015-04-07

... then BULK INSERT cannot handle it. Still, it might be faster overall to pre-process such a CSV file into a pipe-delimited file ...

Click to copy

1|Thompson, Gord|2015-04-15
2|Loblaw, Bob|2015-04-07

... or a tab-delimited file (where → represents the tab character) ...

Click to copy

1→Thompson, Gord→2015-04-15
2→Loblaw, Bob→2015-04-07

... and then BULK INSERT that file. For the latter (tab-delimited) file the BULK INSERT code would look something like this:

Click to copy

import pypyodbc
conn_str = "DSN=myDb_SQLEXPRESS;"
cnxn = pypyodbc.connect(conn_str)
crsr = cnxn.cursor()
sql = """
BULK INSERT myDb.dbo.SpikeData123
FROM 'C:\\__tmp\\biTest.txt' WITH (
    FIELDTERMINATOR='\\t',
    ROWTERMINATOR='\\n'
    );
"""
crsr.execute(sql)
cnxn.commit()
crsr.close()
cnxn.close()

Note: As mentioned in a comment, executing a BULK INSERT statement is only applicable if the SQL Server instance can directly read the source file. For cases where the source file is on a remote client, see this answer.

answered Oct 24 '22 07:10

Gord Thompson

yes bulk insert is right path for loading large files into a DB. At a glance I would say that the reason it takes so long is as you mentioned you are looping over each row of data from the file which effectively means are removing the benefits of using a bulk insert and making it like a normal insert. Just remember that as it's name implies that it is used to insert chucks of data. I would remove loop and try again.

Also I'd double check your syntax for bulk insert as it doesn't look correct to me. check the sql that is generated by pyodbc as I have a feeling that it might only be executing a normal insert

Alternatively if it is still slow I would try using bulk insert directly from sql and either load the whole file into a temp table with bulk insert then insert the relevant column into the right tables. or use a mix of bulk insert and bcp to get the specific columns inserted or OPENROWSET.

answered Oct 24 '22 06:10

Michael Moura

Related questions
                            
                                How to get output from subprocess.Popen(). proc.stdout.readline() blocks, no data prints out
                            
                                How to write an empty indentation block in Python?
                            
                                Use of True, False, and None as return values in Python functions
                            
                                How to extract text and text coordinates from a PDF file?
                            
                                Making a chart bigger in size
                            
                                How to build a sparkSession in Spark 2.0 using pyspark?
                            
                                How can I build a recursive function in python? [duplicate]
                            
                                NumPy: calculate averages with NaNs removed
                            
                                How to split list and pass them as separate parameter?
                            
                                How to generate unique 64 bits integers from Python?
                            
                                How to check if a module is installed in Python and, if not, install it within the code?
                            
                                FastAPI throws an error (Error loading ASGI app. Could not import module "api")
                            
                                Why requests raise this exception "check_hostname requires server_hostname"?
                            
                                Python OpenCV convert image to byte string?
                            
                                Pip default behavior conflicts with virtualenv?
                            
                                pymongo- How can I have distinct values for a field along with other query parameters
                            
                                Python / Pillow: How to scale an image
                            
                                Pandas - Slice large dataframe into chunks
                            
                                How would I build python myself from source code on Ubuntu?
                            
                                How to fail a python unittest in setUpClass?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to speed up bulk insert to MS SQL Server using pyodbc

Tags:

python

sql-server

csv

bulkinsert

pyodbc

TangoAlee

People also ask

2 Answers

Gord Thompson

Michael Moura

Recent Activity

Donate For Us