Pandas to_sql with parameterized data types like NUMERIC(10,2)

Question

Pandas has a lovely to_sql method for writing dataframes to any RDBMS supported by SQLAlchemy.

Say I have a dataframe generated thusly:

df = pd.DataFrame([-1.04, 0.70, 0.11, -0.43, 1.0], columns=['value'])

If I try to write it to the database without any special behavior, I get a column type of double precision:

df.to_sql('foo_test', an_engine)

If I wanted a different datatype, I could specify it (this works fine):

df.to_sql('foo_test', an_engine, dtype={'value': sqlalchemy.types.NUMERIC})

But if I want to set the precision and scale of the NUMERIC column, it blows up in my face:

df.to_sql('foo_test', an_engine, dtype={'value': sqlalchemy.types.NUMERIC(10,2)})


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-77-dc008463fbfc> in <module>()
      1 df = pd.DataFrame([-1.04, 0.70, 0.11, -0.43, 1.0], columns=['value'])
----> 2 df.to_sql('foo_test', cosd_engine, dtype={'value': sqlalchemy.types.NUMERIC(10,2)})

/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/core/generic.pyc in to_sql(self, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    964             self, name, con, flavor=flavor, schema=schema, if_exists=if_exists,
    965             index=index, index_label=index_label, chunksize=chunksize,
--> 966             dtype=dtype)
    967 
    968     def to_pickle(self, path):

/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    536     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
    537                       index_label=index_label, schema=schema,
--> 538                       chunksize=chunksize, dtype=dtype)
    539 
    540 

/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
   1162             import sqlalchemy.sql.type_api as type_api
   1163             for col, my_type in dtype.items():
-> 1164                 if not issubclass(my_type, type_api.TypeEngine):
   1165                     raise ValueError('The type of %s is not a SQLAlchemy '
   1166                                      'type ' % col)

TypeError: issubclass() arg 1 must be a class

I'm trying to dig into why the type for sqlalchemy.types.NUMERIC passes the test on 1164, while sqlalchemy.types.NUMERIC(10,2) doesn't. They do have different types (sqlalchemy.sql.visitors.VisitableType vs sqlalchemy.sql.sqltypes.NUMERIC).

Any clues would be much appreciated!

Bob Haffner · Accepted Answer

Update: this bug is fixed for pandas >= 0.16.0

This is post about a recent pandas bug with the same error with 0.15.2.

https://github.com/pydata/pandas/issues/9083

A Collaborator suggests a to_sql monkey patch as a way to solve it

from pandas.io.sql import SQLTable

def to_sql(self, frame, name, if_exists='fail', index=True,
           index_label=None, schema=None, chunksize=None, dtype=None):
    """
    patched version of https://github.com/pydata/pandas/blob/v0.15.2/pandas/io/sql.py#L1129
    """
    if dtype is not None:
        from sqlalchemy.types import to_instance, TypeEngine
        for col, my_type in dtype.items():
            if not isinstance(to_instance(my_type), TypeEngine):
                raise ValueError('The type of %s is not a SQLAlchemy '
                                 'type ' % col)

    table = SQLTable(name, self, frame=frame, index=index,
                     if_exists=if_exists, index_label=index_label,
                     schema=schema, dtype=dtype)
    table.create()
    table.insert(chunksize)
    # check for potentially case sensitivity issues (GH7815)
    if name not in self.engine.table_names(schema=schema or self.meta.schema):
        warnings.warn("The provided table name '{0}' is not found exactly "
                      "as such in the database after writing the table, "
                      "possibly due to case sensitivity issues. Consider "
                      "using lower case table names.".format(name), UserWarning)

pd.io.sql.SQLDatabase.to_sql = to_sql

Pandas to_sql with parameterized data types like NUMERIC(10,2)

Tags:

python

pandas

postgresql

sqlalchemy

Idan Gazit

1 Answers

Bob Haffner

Recent Activity

Donate For Us

Pandas to_sql with parameterized data types like NUMERIC(10,2)

Tags:

python

pandas

postgresql

sqlalchemy

Idan Gazit

1 Answers

Bob Haffner

Related questions

Recent Activity

Donate For Us