Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas and SQLAlchemy: df.to_sql() with SQLAlchemy 2.0 future=True throws an error when using a connection from engine.begin()

I tried to upsert a table in MS SQL Server from a pandas DataFrame.

As I understand it, it's a two step process:

  • do a pandas df.to_sql() into a temp table
  • execute a magic sql to merger the temp table in the existing final table

this works, but only if i set future=False in the create_engine call.

   self.engine = create_engine(self.url, echo=True, future=False)
  with self.engine.begin() as conn:
            df.to_sql("#update_table", conn, if_exists="append", index=False)
            # merge...
            sql = f"""\
            -- magic sql
            """
            result = conn.execute(text(sql)).fetchone()

When using future=True I get this error:

Traceback (most recent call last):
  File "/usr/local/Caskroom/miniconda/base/envs/dc_uk/lib/python3.9/site-packages/pandas/io/sql.py", line 1340, in insert_records
    table.insert(chunksize=chunksize, method=method)
  File "/usr/local/Caskroom/miniconda/base/envs/dc_uk/lib/python3.9/site-packages/pandas/io/sql.py", line 959, in insert
    with self.pd_sql.run_transaction() as conn:
  File "/usr/local/Caskroom/miniconda/base/envs/dc_uk/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/usr/local/Caskroom/miniconda/base/envs/dc_uk/lib/python3.9/site-packages/pandas/io/sql.py", line 1416, in run_transaction
    with self.connectable.begin() as tx:
  File "/usr/local/Caskroom/miniconda/base/envs/dc_uk/lib/python3.9/site-packages/sqlalchemy/future/engine.py", line 144, in begin
    return super(Connection, self).begin()
  File "/usr/local/Caskroom/miniconda/base/envs/dc_uk/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 766, in begin
    raise exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: This connection has already initialized a SQLAlchemy Transaction() object via begin() or autobegin; can't call begin() here unless rollback() or commit() is called first.

It seems like something in pandas can't deal (yet) with the changes in SA2. :(

There are some questions about SQLAlchemy 2.0 from earlier this year but all of them are closed.

My Question: Anything i did wrong? or is that still an open issue...?

PS: This is on a mac, python 3.9.7, pandas 1.3.4, and SQLAlchemy 1.4.27.

like image 564
A. Rabus Avatar asked Jul 03 '26 08:07

A. Rabus


2 Answers

Per the docs "future" causes a transaction to start automatically, so you don't call .begin().

eg

with engine.connect() as conn:
    conn.execute(...)
    conn.execute(...)
    conn.commit()

    conn.execute(...)
    conn.execute(...)
    conn.commit()

method sqlalchemy.future.Connection.begin()

like image 109
David Browne - Microsoft Avatar answered Jul 05 '26 00:07

David Browne - Microsoft


This is a known issue in pandas. See this issue comment.

Rather than setting future=False, I replaced the call to to_sql with to_dict + SQL query.

(I'm using postgresql and appending doesn't take a temp table.)

        data_dict = data_df.reset_index().to_dict(orient="records")

        query = (
            sqla.dialects.postgresql.insert(table)
            .values(data_dict)
            .on_conflict_do_nothing()
        )
        db.session.execute(query)
        db.session.commit()
like image 21
Jérôme Avatar answered Jul 04 '26 22:07

Jérôme



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!