As referenced, I've created a collection of data (40k rows, 5 columns) within Python that I'd like to insert back into a SQL Server table.
Typically, within SQL I'd make a 'select * into myTable from dataTable'
call to do the insert, but the data sitting within a pandas dataframe obviously complicates this.
I'm not formally opposed to using SQLAlchemy (though would prefer to avoid another download and install), but would prefer to do this natively within Python, and am connecting to SSMS using pyodbc.
Is there a straightforward way to do this that avoids looping (ie, insert row by row)?
In my previous article in the series, I have explained how to create an engine using the SQLAlchemy module and how to connect to databases in Python while using the Pandas module. You can use any database to connect to starting from MySQL, SQL Server, PostgreSQL, SQLite, etc.
Create a dataframe by calling the pandas dataframe constructor and passing the python dict object as data. Invoke to_sql() method on the pandas dataframe instance and specify the table name and database connection. This creates a table in MySQL database server and populates it with the data from the pandas dataframe.
As shown in this answer we can convert a DataFrame named df
into a list of tuples by doing list(df.itertuples(index=False, name=None)
so we can pass that to executemany
without (explicitly) looping through each row.
crsr = cnxn.cursor()
crsr.fast_executemany = True
crsr.executemany(
"INSERT INTO #tablename (col1, col2) VALUES (?, ?)",
list(df.itertuples(index=False, name=None))
)
crsr.commit()
That is as "native" as you'll get, but it can lead to errors if the DataFrame contains pandas data types that are not recognized by pyodbc (which expects Python types as parameter values). You may still be better off using SQLAlchemy and pandas' to_sql
method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With