Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_sql can't write to schema besides 'public' on PostgreSQL

I'm trying to write the contents of a data frame to a table in a schema besides the 'public' schema. I followed the pattern described in Pandas writing dataframe to other postgresql schema:

meta = sqlalchemy.MetaData()
engine = create_engine('postgresql://some:user@host/db')
meta = sqlalchemy.MetaData(engine, schema='schema')
meta.reflect(engine, schema='schema')
pdsql = pandas.io.sql.PandasSQLAlchemy(engine, meta=meta)

But when I try to write to the table:

pdsql.to_sql(df, 'table', if_exists='append')

I get the following error:

InvalidRequestError: Table 'schema.table' is already defined for this MetaData instance.  Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

I also tried adding extend_existing=Trueto the reflect call, but that doesn't seem to make a difference.

How can I get pandas to write to this table?

like image 220
2daaa Avatar asked Nov 06 '25 21:11

2daaa


1 Answers

Update: starting from pandas 0.15, writing to different schema's is supported. Then you will be able to use the schema keyword argument:

df.to_sql('test', engine, schema='a_schema')

As I said in the linked question, writing to different schema's is not yet supported at the moment with the read_sql and to_sql functions (but an enhancement request has already been filed: https://github.com/pydata/pandas/issues/7441).

However, I described a workaround using the object interface. But what I described there only works for adding the table once, not for replacing and/or appending the table. So if you just want to add, first delete the existing table and then write again.

If you want to append to the table, below is a little bit more hacky workaround. First redefine has_table and get_table:

def has_table(self, name):
    return self.engine.has_table(name, schema=self.meta.schema)

def get_table(self, table_name):
    if self.meta.schema:
        table_name = self.meta.schema + '.' + table_name
    return self.meta.tables.get(table_name)

pd.io.sql.PandasSQLAlchemy.has_table = has_table
pd.io.sql.PandasSQLAlchemy.get_table = get_table

Then create the PandasSQLAlchemy object as you did, and write the data:

meta = sqlalchemy.MetaData(engine, schema='schema')
meta.reflect()
pdsql = pd.io.sql.PandasSQLAlchemy(engine, meta=meta)
pdsql.to_sql(df, 'table', if_exists='append')

This is obviously not the good way to do, but we are working to provide a better API for 0.15. If you want to help, pitch in at https://github.com/pydata/pandas/issues/7441.

Beware! This interface (PandasSQLAlchemy) is not yet really public and will still undergo changes in the next version of pandas, but this is how you can do it for pandas 0.14(.1).

Update: PandasSQLAlchemy is renamed to SQLDatabase in pandas 0.15.

like image 178
joris Avatar answered Nov 08 '25 11:11

joris