Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy to MSSQL using create_engine

Most examples I can find showing a full MSSQL connection method with Python are out of date as of a few months back, thanks in part to some optimisations in SQLAlchemy 1.3. I'm trying to replicate what I can see in the documentation.

I'm having trouble getting SQLAlchemy connected to MSSSQL Server using pyodbc.

I have a local SQL server, reachable from SQL Server Management Studio on: #DESKTOP-QLSOTTG\SQLEXPRESS
The Database is: TestDB
The username, for this example is: TestUser
The password, for this example is: TestUserPass

I'm wanting to run a test case (cases?) of importing a pandas dataframe into the MSSQL database in order to work out what is the speediest way of doing things. However, the purpose of this question is around connectivity.

Credit: I borrowed some code from Gord for the dataframe/update here.

import pandas as pd
import numpy as np
import time
from sqlalchemy import create_engine, event
from urllib.parse import quote_plus


# for pyodbc
#engine = create_engine('mssql+pyodbc://TestUser:TestUserPAss@DESKTOP-QLSOTTG\\SQLEXPRESS:1433/TestDB?driver=ODBC+Driver+17+for+SQL+Server', fast_executemany=True)
engine = create_engine("mssql+pyodbc://TestUser:TestUserPass@DESKTOP-QLSOTTG\\SQLEXPRESS:1433/TestDB?driver=ODBC+Driver+13+for+SQL+Server", fast_executemany=True)

# test data
num_rows = 10000
num_cols = 100
df = pd.DataFrame(
    [[f'row{x:04}col{y:03}' for y in range(num_cols)] for x in range(num_rows)],
    columns=[f'col{y:03}' for y in range(num_cols)]
)

t0 = time.time()
df.to_sql("sqlalchemy_test", engine, if_exists='replace', index=None)
print(f"pandas wrote {num_rows} rows in {(time.time() - t0):0.1f} seconds")

The error that I'm receiving is the following. I'm going to assume that the server 'actively refusing the connection' is because my connection string is somehow messed up, but I cant seem to see why.:

OperationalError: (pyodbc.OperationalError) ('08001', '[08001] [Microsoft][ODBC Driver 13 for SQL Server]TCP Provider: No connection could be made because the target machine actively refused it.\r\n (10061) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 13 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 13 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (10061)')
(Background on this error at: http://sqlalche.me/e/13/e3q8)

The database and user are connectable from SQL Server Management Studio.*

Any ideas on what I might be missing?

SQL Server Screenshot

Notes:

  • Changing DESKTOP-QLSOTTG\SQLEXPRESS:1433 to DESKTOP-QLSOTTG:1433 didn't change the error
  • Changing DESKTOP-QLSOTTG\SQLEXPRESS:1433 to localhost:1433 didn't change the error
  • Changing DESKTOP-QLSOTTG\SQLEXPRESS:1433 to localhost\SQLEXPRESS:1433 didn't change the error
like image 748
anakaine Avatar asked Oct 18 '25 13:10

anakaine


1 Answers

I'm going to answer this with a fully worked example, because I had a few other issues along the way.

This example is able to:

  • Use fast_executemany with user specified memory friendly chunking to load data to a MS SQL database very quickly.
  • Load 10,000 records (25 columns) to a Microsoft SQL (MSSQL) database in about 0.3 seconds.
  • Load 1,000,000 records (25 columns) to a Microsoft SQL (MSSQL) database in about 45 seconds.
  • Load 10,000,000 records (25 columns) to a Microsoft SQL (MSSQL) database in about 9 minutes.
  • Chunk data using a preconfigured function that avoids using pandas chunksize, which causes memory errors on larger datasets. Credit for chunking.
  • Can help you load data into an SQL server which has low ram. Bulk loads take a lot of ram, smaller load sizes use far less memory.
  • Possible to add a try/except statement to catch load errors for the chunk you are trying for logging / try again later type setups.

I've included some untested connection strings for a few other DB providers. Current versions of pandas, sqlalchemy, pyodbc, etc as of December 2020.

%%time #remove this if you are not using a Jupyter notebook and just want to run a .py script

import pandas as pd
import numpy as np
import sqlalchemy as sql
import sys
import math

# Enterprise DB to be used
DRIVER = "ODBC Driver 17 for SQL Server"
USERNAME = "TestUser"
PSSWD = "TestUser"
SERVERNAME = "DESKTOP-QLSOTTG"
INSTANCENAME = "\SQLEXPRESS"
DB = "TestDB"
TABLE = "perftest"


conn_executemany = sql.create_engine(
    f"mssql+pyodbc://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}?driver={DRIVER}", fast_executemany=True
)



def chunker(seq, size):
    return (seq[pos : pos + size] for pos in range(0, len(seq), size))


def insert_with_progress(df, engine, table="", schema=""):
    con = engine.connect()

    # Replace table
    engine.execute(f"DROP TABLE IF EXISTS {schema}.{table};")

    # Insert with progress
    SQL_SERVER_CHUNK_LIMIT = 100000
    chunksize = math.floor(SQL_SERVER_CHUNK_LIMIT / len(df.columns))

    for chunk in chunker(df, chunksize):
        chunk.to_sql(
            name=table,
            con=con,
            if_exists="append",
            index=False
        )
        
df = pd.DataFrame(np.random.random((10 ** 7, 24)))
df['TextCol'] = "Test Goes Here"
df.head()
print("DataFrame is", round(sys.getsizeof(df) / 1024 ** 2, 1), "MB")
print("DataFrame contains", len(df), "rows by", len(df.columns), "columns")


# Doing it like this errors out. Can't seem to be able to debug the straight pandas call.
# df.to_sql(TABLE, conn_sqlalchemy, index=False, if_exists='replace', method='multi', chunksize=2100)

insert_with_progress(df, conn_executemany, table=TABLE)

About the connection strings:

  1. You will most likley only need to alter the line beginning with f"mssql+pyodbc:// if you wish to change to another DB type
  2. If your SQL server doesn't use an instance name (eg SQLSERVERNAME\Instance_Name) then you can set the instance name parameter to empty.
  3. If you do use an instance name, ensure you leave the \ at the start of the variable.
  4. If you use a different connection string you will need to also replace the variable name with your connection string name in the last line of the code window above.

Alternate include statements for other providers
These include:

  • pymssql
  • turbobdc
import pymssql as ms
import sqlalchemy as sql
import sqlalchemy_turbodbc as st

Alternate Connection strings
Credit for DSN style strings, which I've modified to work with username/password.

conn_sqlalchemy = sql.create_engine(f"mssql+pyodbc://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}?driver={DRIVER}")

conn_executemany = sql.create_engine(
    f"mssql+pyodbc://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}?driver={DRIVER}", fast_executemany=True
)

conn_turbodbc = sql.create_engine(f"mssql+turbodbc://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}?driver={DRIVER}")

conn_pymssql = sql.create_engine(f"mssql+pymssql://{USERNAME}:{PSSWD}@{SERVERNAME}{INSTANCENAME}/{DB}")
like image 184
anakaine Avatar answered Oct 21 '25 01:10

anakaine



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!