With Pandas, I can very easily read data from a database into a dataframe:
from sqlalchemy import create_engine
import pandas
query = 'SELECT * FROM Table_Name;'
engine = create_engine('...')
df = pandas.read_sql_query(query, engine)
print(df.head())
I would like to make sure that no connection is kept open after executing .read_sql_query()
, no matters if the query succeeded or if it raised an exception.
I am currently:
poolclass=NullPool
.engine.disponse()
.Like so:
from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool
import pandas
def get_data():
query = 'SELECT * FROM Table_Name;'
try:
engine = create_engine('...', poolclass=NullPool)
df = pandas.read_sql_query(query, engine)
finally:
engine.dispose()
return df
print(get_data().head())
Is there a better way?
connect() method returns a Connection object, and by using it in a Python context manager (e.g. the with: statement) the Connection. close() method is automatically invoked at the end of the block.
Instead of returning entire model instances, SQLAlchemy can fetch only the columns we're interested in. This not only reduces the amount of data sent, but also avoids the need to instantiate entire objects. Working with tuples of column data instead of models can be quite a bit faster.
Since SQLAlchemy is integrated with Pandas, we can use its SQL connection directly with “con = conn”.
Update table elements in SQLAlchemy. Get the books to table from the Metadata object initialized while connecting to the database. Pass the update query to the execute() function and get all the results using fetchall() function. Use a for loop to iterate through the results.
When using sqlalchemy
with pandas read_sql_query(query, con)
method, it will create a SQLDatabase
object with an attribute connectable
to self.connectable.execute(query)
. And the SQLDatabase.connectable
is initialized as con
as long as it is an instance of sqlalchemy.engine.Connectable
(i.e. Engine
and Connection
).
Engine
object as con
Just as example code in your question:
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('...')
df = pd.read_sql_query(query, con=engine)
Internally, pandas just use result = engine.execute(query)
, which means:
Where above, the
execute()
method acquires a newConnection
on its own, executes the statement with that object, and returns theResultProxy
. In this case, the ResultProxy contains a special flag known asclose_with_result
, which indicates that when its underlying DBAPI cursor is closed, theConnection
object itself is also closed, which again returns the DBAPI connection to the connection pool, releasing transactional resources.
In this case, you don't have to worry about the Connection
itself, which is closed automatically, but it will keep the connection pool of engine
.
So you can either disable pooling by using:
engine = create_engine('...', poolclass=NullPool)
or dispose
the engine entirely with engine.dispose()
at the end.
But following the Engine Disposal doc (the last paragraph), these two are alternative, you don't have to use them at the same time. So in this case, for simple one-time usage of read_sql_query
and clean-up, I think this should be enough:
# Clean up entirely after every query.
engine = create_engine('...')
df = pd.read_sql_query(query, con=engine)
engine.dispose()
Connection
object as con
:connection = engine.connect()
print(connection.closed) # False
df = pd.read_sql_query(query, con=connection)
print(connection.closed) # False again
# do_something_else(connection)
connection.close()
print(connection.closed) # True
engine.dispose()
You should do this whenever you want greater control over attributes of the connection, when it gets closed, etc. For example, a very import example of this is a
Transaction
, which lets you decide when to commit your changes to the database. (from this answer)
But with pandas, we have no control inside the read_sql_query
, the only usefulness of connection
is that it allows you to do more useful things before we explicitly close it.
I think I would like to use following pattern, which gives me more control of connections and leaves the future extensibility:
engine = create_engine('...')
# Context manager makes sure the `Connection` is closed safely and implicitly
with engine.connect() as conn:
df = pd.read_sql_query(query, conn)
print(conn.in_transaction()) # False
# do_something_with(conn)
trans = conn.begin()
print(conn.in_transaction()) # True
# do_whatever_with(trans)
print(conn.closed) # False
print('Is Connection with-OUT closed?', conn.closed) # True
engine.dispose()
But for simple usage cases such as your example code, I think both ways are equally clean and simple for clean-up DB IO resources.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With