Clean-up database connection with SQLAlchemy in Pandas

Tags:

With Pandas, I can very easily read data from a database into a dataframe:

from sqlalchemy import create_engine
import pandas


query = 'SELECT * FROM Table_Name;'
engine = create_engine('...')

df = pandas.read_sql_query(query, engine)

print(df.head())

I would like to make sure that no connection is kept open after executing .read_sql_query(), no matters if the query succeeded or if it raised an exception.

I am currently:

Using a function to restrict the engine's scope. I only expect to call this function once each half an hour, so I do not mind re-creating the engine if that helps ensuring everything is cleaned/closed/garbage-collected.
Disabling pooling with poolclass=NullPool.
Finally calling engine.disponse().

Like so:

from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool
import pandas


def get_data():
    query = 'SELECT * FROM Table_Name;'
    try:
        engine = create_engine('...', poolclass=NullPool)
        df = pandas.read_sql_query(query, engine)
    finally:
        engine.dispose()
    return df


print(get_data().head())

Is there a better way?

339

asked Jul 04 '18 09:07

Peque

1 Answers

Backgrounds:

When using sqlalchemy with pandas read_sql_query(query, con) method, it will create a SQLDatabase object with an attribute connectable to self.connectable.execute(query). And the SQLDatabase.connectable is initialized as con as long as it is an instance of sqlalchemy.engine.Connectable (i.e. Engine and Connection).

Case I: when passing `Engine` object as `con`

Just as example code in your question:

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('...')
df = pd.read_sql_query(query, con=engine)

Internally, pandas just use result = engine.execute(query), which means:

Where above, the execute() method acquires a new Connection on its own, executes the statement with that object, and returns the ResultProxy. In this case, the ResultProxy contains a special flag known as close_with_result, which indicates that when its underlying DBAPI cursor is closed, the Connection object itself is also closed, which again returns the DBAPI connection to the connection pool, releasing transactional resources.

In this case, you don't have to worry about the Connection itself, which is closed automatically, but it will keep the connection pool of engine.

So you can either disable pooling by using:

engine = create_engine('...', poolclass=NullPool)

or dispose the engine entirely with engine.dispose() at the end.

But following the Engine Disposal doc (the last paragraph), these two are alternative, you don't have to use them at the same time. So in this case, for simple one-time usage of read_sql_query and clean-up, I think this should be enough:

# Clean up entirely after every query.
engine = create_engine('...')
df = pd.read_sql_query(query, con=engine)
engine.dispose()

Case II: when passing `Connection` object as `con`:

connection = engine.connect()
print(connection.closed) # False
df = pd.read_sql_query(query, con=connection)
print(connection.closed) # False again
# do_something_else(connection)
connection.close()
print(connection.closed) # True
engine.dispose()

You should do this whenever you want greater control over attributes of the connection, when it gets closed, etc. For example, a very import example of this is a Transaction, which lets you decide when to commit your changes to the database. (from this answer)

But with pandas, we have no control inside the read_sql_query, the only usefulness of connection is that it allows you to do more useful things before we explicitly close it.

So generally speaking:

I think I would like to use following pattern, which gives me more control of connections and leaves the future extensibility:

engine = create_engine('...')
# Context manager makes sure the `Connection` is closed safely and implicitly
with engine.connect() as conn:
    df = pd.read_sql_query(query, conn)
    print(conn.in_transaction()) # False
    # do_something_with(conn)
    trans = conn.begin()
    print(conn.in_transaction()) # True
    # do_whatever_with(trans)
    print(conn.closed) # False
print('Is Connection with-OUT closed?', conn.closed) # True
engine.dispose()

But for simple usage cases such as your example code, I think both ways are equally clean and simple for clean-up DB IO resources.

198

answered Oct 13 '22 20:10

YaOzI

Related questions
                            
                                matplotlib animation movie: quality of movie decreasing with time
                            
                                sklearn: use Pipeline in a RandomizedSearchCV?
                            
                                How to make two markers share the same label in the legend using matplotlib?
                            
                                Print exception with stack trace to file
                            
                                Error with Sklearn Random Forest Regressor
                            
                                Pandas Dataframe: How to update multiple columns by applying a function?
                            
                                How to find the shortest dependency path between two words in Python?
                            
                                'Graph' object has no attribute 'nodes_iter' in networkx module python
                            
                                How to make a ttk.Combobox callback
                            
                                Django: How to get related objects of a queryset?
                            
                                Get all comments from a specific reddit thread in python
                            
                                SqlAlchemy: How to implement DROP TABLE ... CASCADE?
                            
                                Error when using importlib.util to check for library
                            
                                Django loaddata UNIQUE constraint failed
                            
                                Python: nested 'for' loops
                            
                                Create adjacency matrix for two columns in pandas dataframe
                            
                                Max in a sliding window in NumPy array
                            
                                pandas read_excel multiple tables on the same sheet
                            
                                sklearn: Hyperparameter tuning by gradient descent?
                            
                                How to extract and save images from tensorboard event summary?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clean-up database connection with SQLAlchemy in Pandas

Tags:

python

pandas

sqlalchemy

Peque

People also ask

1 Answers

Backgrounds:

Case I: when passing `Engine` object as `con`

Case II: when passing `Connection` object as `con`:

So generally speaking:

YaOzI

Recent Activity

Donate For Us

Clean-up database connection with SQLAlchemy in Pandas

Tags:

python

pandas

sqlalchemy

Peque

People also ask

1 Answers

Backgrounds:

Case I: when passing Engine object as con

Case II: when passing Connection object as con:

So generally speaking:

YaOzI

Related questions

Recent Activity

Donate For Us

Case I: when passing `Engine` object as `con`

Case II: when passing `Connection` object as `con`: