Most examples I can find showing a full MSSQL connection method with Python are out of date as of a few months back, thanks in part to some optimisations in SQLAlchemy 1.3. I'm trying to replicate what I can see in the documentation.
I'm having trouble getting SQLAlchemy connected to MSSSQL Server using pyodbc.
I have a local SQL server, reachable from SQL Server Management Studio on: #DESKTOP-QLSOTTG\SQLEXPRESS
The Database is: TestDB
The username, for this example is: TestUser
The password, for this example is: TestUserPass
I'm wanting to run a test case (cases?) of importing a pandas dataframe into the MSSQL database in order to work out what is the speediest way of doing things. However, the purpose of this question is around connectivity.
Credit: I borrowed some code from Gord for the dataframe/update here.
import pandas as pd
import numpy as np
import time
from sqlalchemy import create_engine, event
from urllib.parse import quote_plus
# for pyodbc
#engine = create_engine('mssql+pyodbc://TestUser:TestUserPAss@DESKTOP-QLSOTTG\\SQLEXPRESS:1433/TestDB?driver=ODBC+Driver+17+for+SQL+Server', fast_executemany=True)
engine = create_engine("mssql+pyodbc://TestUser:TestUserPass@DESKTOP-QLSOTTG\\SQLEXPRESS:1433/TestDB?driver=ODBC+Driver+13+for+SQL+Server", fast_executemany=True)
# test data
num_rows = 10000
num_cols = 100
df = pd.DataFrame(
[[f'row{x:04}col{y:03}' for y in range(num_cols)] for x in range(num_rows)],
columns=[f'col{y:03}' for y in range(num_cols)]
)
t0 = time.time()
df.to_sql("sqlalchemy_test", engine, if_exists='replace', index=None)
print(f"pandas wrote {num_rows} rows in {(time.time() - t0):0.1f} seconds")
The error that I'm receiving is the following. I'm going to assume that the server 'actively refusing the connection' is because my connection string is somehow messed up, but I cant seem to see why.:
OperationalError: (pyodbc.OperationalError) ('08001', '[08001] [Microsoft][ODBC Driver 13 for SQL Server]TCP Provider: No connection could be made because the target machine actively refused it.\r\n (10061) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 13 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 13 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (10061)')
(Background on this error at: http://sqlalche.me/e/13/e3q8)
The database and user are connectable from SQL Server Management Studio.*
Any ideas on what I might be missing?
Notes:
I'm going to answer this with a fully worked example, because I had a few other issues along the way.
This example is able to:
I've included some untested connection strings for a few other DB providers. Current versions of pandas, sqlalchemy, pyodbc, etc as of December 2020.
%%time #remove this if you are not using a Jupyter notebook and just want to run a .py script
import pandas as pd
import numpy as np
import sqlalchemy as sql
import sys
import math
# Enterprise DB to be used
DRIVER = "ODBC Driver 17 for SQL Server"
USERNAME = "TestUser"
PSSWD = "TestUser"
SERVERNAME = "DESKTOP-QLSOTTG"
INSTANCENAME = "\SQLEXPRESS"
DB = "TestDB"
TABLE = "perftest"
conn_executemany = sql.create_engine(
f"mssql+pyodbc://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}?driver={DRIVER}", fast_executemany=True
)
def chunker(seq, size):
return (seq[pos : pos + size] for pos in range(0, len(seq), size))
def insert_with_progress(df, engine, table="", schema=""):
con = engine.connect()
# Replace table
engine.execute(f"DROP TABLE IF EXISTS {schema}.{table};")
# Insert with progress
SQL_SERVER_CHUNK_LIMIT = 100000
chunksize = math.floor(SQL_SERVER_CHUNK_LIMIT / len(df.columns))
for chunk in chunker(df, chunksize):
chunk.to_sql(
name=table,
con=con,
if_exists="append",
index=False
)
df = pd.DataFrame(np.random.random((10 ** 7, 24)))
df['TextCol'] = "Test Goes Here"
df.head()
print("DataFrame is", round(sys.getsizeof(df) / 1024 ** 2, 1), "MB")
print("DataFrame contains", len(df), "rows by", len(df.columns), "columns")
# Doing it like this errors out. Can't seem to be able to debug the straight pandas call.
# df.to_sql(TABLE, conn_sqlalchemy, index=False, if_exists='replace', method='multi', chunksize=2100)
insert_with_progress(df, conn_executemany, table=TABLE)
About the connection strings:
f"mssql+pyodbc://
if you wish to change to another DB typeAlternate include statements for other providers
These include:
import pymssql as ms
import sqlalchemy as sql
import sqlalchemy_turbodbc as st
Alternate Connection strings
Credit for DSN style strings, which I've modified to work with username/password.
conn_sqlalchemy = sql.create_engine(f"mssql+pyodbc://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}?driver={DRIVER}")
conn_executemany = sql.create_engine(
f"mssql+pyodbc://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}?driver={DRIVER}", fast_executemany=True
)
conn_turbodbc = sql.create_engine(f"mssql+turbodbc://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}?driver={DRIVER}")
conn_pymssql = sql.create_engine(f"mssql+pymssql://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With