Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_sql with parameterized data types like NUMERIC(10,2)

Pandas has a lovely to_sql method for writing dataframes to any RDBMS supported by SQLAlchemy.

Say I have a dataframe generated thusly:

df = pd.DataFrame([-1.04, 0.70, 0.11, -0.43, 1.0], columns=['value'])

If I try to write it to the database without any special behavior, I get a column type of double precision:

df.to_sql('foo_test', an_engine)

If I wanted a different datatype, I could specify it (this works fine):

df.to_sql('foo_test', an_engine, dtype={'value': sqlalchemy.types.NUMERIC})

But if I want to set the precision and scale of the NUMERIC column, it blows up in my face:

df.to_sql('foo_test', an_engine, dtype={'value': sqlalchemy.types.NUMERIC(10,2)})


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-77-dc008463fbfc> in <module>()
      1 df = pd.DataFrame([-1.04, 0.70, 0.11, -0.43, 1.0], columns=['value'])
----> 2 df.to_sql('foo_test', cosd_engine, dtype={'value': sqlalchemy.types.NUMERIC(10,2)})

/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/core/generic.pyc in to_sql(self, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    964             self, name, con, flavor=flavor, schema=schema, if_exists=if_exists,
    965             index=index, index_label=index_label, chunksize=chunksize,
--> 966             dtype=dtype)
    967 
    968     def to_pickle(self, path):

/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    536     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
    537                       index_label=index_label, schema=schema,
--> 538                       chunksize=chunksize, dtype=dtype)
    539 
    540 

/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
   1162             import sqlalchemy.sql.type_api as type_api
   1163             for col, my_type in dtype.items():
-> 1164                 if not issubclass(my_type, type_api.TypeEngine):
   1165                     raise ValueError('The type of %s is not a SQLAlchemy '
   1166                                      'type ' % col)

TypeError: issubclass() arg 1 must be a class

I'm trying to dig into why the type for sqlalchemy.types.NUMERIC passes the test on 1164, while sqlalchemy.types.NUMERIC(10,2) doesn't. They do have different types (sqlalchemy.sql.visitors.VisitableType vs sqlalchemy.sql.sqltypes.NUMERIC).

Any clues would be much appreciated!

like image 599
Idan Gazit Avatar asked Feb 11 '23 05:02

Idan Gazit


1 Answers

Update: this bug is fixed for pandas >= 0.16.0


This is post about a recent pandas bug with the same error with 0.15.2.

https://github.com/pydata/pandas/issues/9083

A Collaborator suggests a to_sql monkey patch as a way to solve it

from pandas.io.sql import SQLTable

def to_sql(self, frame, name, if_exists='fail', index=True,
           index_label=None, schema=None, chunksize=None, dtype=None):
    """
    patched version of https://github.com/pydata/pandas/blob/v0.15.2/pandas/io/sql.py#L1129
    """
    if dtype is not None:
        from sqlalchemy.types import to_instance, TypeEngine
        for col, my_type in dtype.items():
            if not isinstance(to_instance(my_type), TypeEngine):
                raise ValueError('The type of %s is not a SQLAlchemy '
                                 'type ' % col)

    table = SQLTable(name, self, frame=frame, index=index,
                     if_exists=if_exists, index_label=index_label,
                     schema=schema, dtype=dtype)
    table.create()
    table.insert(chunksize)
    # check for potentially case sensitivity issues (GH7815)
    if name not in self.engine.table_names(schema=schema or self.meta.schema):
        warnings.warn("The provided table name '{0}' is not found exactly "
                      "as such in the database after writing the table, "
                      "possibly due to case sensitivity issues. Consider "
                      "using lower case table names.".format(name), UserWarning)

pd.io.sql.SQLDatabase.to_sql = to_sql
like image 87
Bob Haffner Avatar answered Feb 16 '23 02:02

Bob Haffner