Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

write_database(..., engine="adbc") with autocommit=False

In polars, I would like to use pl.write_database multiple times with engine="adbc" in the same session and then commit all at the end with conn.commit(), i.e. do a manual commit.

import adbc_driver_postgresql.dbapi as pg_dbapi
import polars as pl

conn = pg_dbapi.connect("postgresql://username:password@host:port/database")

df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

df.write_database(
    "public.table1",
    connection=conn,
    engine="adbc",
)

df.transpose().write_database(
    "public.table2",
    connection=conn,
    engine="adbc",
)

conn.commit()

The reason behind this is to ensure that either both dfs are written to the database or none are. However, the dfs are written immediately into the database one after the other. In the adbc docs, it is said:

By default, connections are expected to operate in autocommit mode; that is, queries take effect immediately upon execution. This can be disabled in favor of manual commit/rollback calls, but not all implementations will support this.

Is it supported to disable autocommit somehow in python? Maybe this can be done in adbc_driver_postgresql.dbapi.connect, maybe with the conn_kwargs parameter? conn_kwargs={"autocommit": False} didn't work.

like image 666
mouwsy Avatar asked May 24 '26 15:05

mouwsy


1 Answers

use pl.write_database multiple times with engine="adbc" in the same session and then commit all at the end with conn.commit(), i.e. do a manual commit.

Surprisingly, this won't work because polars is quietly sneaking in commits of its own, inside pl.write_database(). As per PEP 249 – Python Database API Specification v2.0, autocommit is off by default (and in your case it stays off the whole time):

Note that if the database supports an auto-commit feature, this must be initially off. An interface method may be provided to turn it back on.

You can confirm the signature of adbc_driver_manager.dbapi.Connection follows the spec:

class adbc_driver_manager.dbapi.Connection(
   db: AdbcDatabase | _SharedDatabase, 
   conn: AdbcConnection, 
   conn_kwargs: Dict[str, str] | None = None, 
   *, 
   autocommit=False, ########################## here
   backend: DbapiBackend | None = None)

Double-check that dynamically, by inspecting the Connection's underlying AdbcConnection:

import adbc_driver_postgresql.dbapi as pg_dbapi
import polars as pl

conn = pg_dbapi.connect("postgresql://username:password@host:port/database",
                        #here's how you pass that in, explicitly
                        conn_kwargs={"adbc.connection.autocommit": 'false'})
print(conn.adbc_connection.get_option('adbc.connection.autocommit'))

That param value has to be a string, not a boolean, it's case-sensitive and doesn't follow Postgres' own, highly tolerant boolean input syntax which would otherwise accept any case variant of [0,1,f,fa,fal,fals,false,t,tr,tru,true,of,off,on], even with leading and trailing whitespace.
Regardless of whether you explicitly pass the parameter in or not, it defaults to false. Strangely enough, even if you give it a true when opening the connection, it will still be off unless you enable it manually, afterwards:

conn.adbc_connection.set_autocommit(True)

That's how you discover it's all polars' fault: enabling autocommit breaks df.write_database() when it turns out it tries to run a commit of its own:

adbc_driver_manager.ProgrammingError: INVALID_STATE: 
[libpq] Cannot commit when autocommit is enabled

It's not mentioned in the documentation, but its source polars/py-polars/polars/dataframe/frame.py:4405 confirms if engine is adbc, then write_database() runs a conn.commit() before returning, and it doesn't expose any params to switch it on or off.

Your options are

  1. Patching that quiet commit out of polars. It looks like the authors of that method just put it there initially (acec4c1, feature #7318) and it's been left like that. That might change with #24243.
  2. Using sqlalchemy engine for that operation instead. In that case .write_database() doesn't add the quiet commit and uses pandas.df.to_sql() which respects the transaction state:

    If passing a sqlalchemy.engine.Connection which is already in a transaction, the transaction will not be committed.

like image 93
Zegarek Avatar answered May 27 '26 04:05

Zegarek



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!