Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_sql changing datatype in database table

Has anyone experienced this before?

I have a table with "int" and "varchar" columns - a report schedule table.

I am trying to import an excel file with ".xls" extension to this table using a python program. I am using pandas to_sql to read in 1 row of data.

Data imported is 1 row 11 columns.

Import works successfully but after the import I noticed that the datatypes in the original table have now been altered from:

        int --> bigint
        char(1) --> varchar(max)
        varchar(30) --> varchar(max)

Any idea how I can prevent this? The switch in datatypes is causing issues in downstrean routines.

   df = pd.read_excel(schedule_file,sheet_name='Schedule')
   params = urllib.parse.quote_plus(r'DRIVER={SQL Server};SERVER=<<IP>>;DATABASE=<<DB>>;UID=<<UDI>>;PWD=<<PWD>>')
   conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
   engine = create_engine(conn_str)

   table_name='REPORT_SCHEDULE'
   df.to_sql(name=table_name,con=engine, if_exists='replace',index=False)

TIA

like image 557
Lucky Avatar asked Nov 14 '18 06:11

Lucky


People also ask

How do I change the data type of a pandas series?

Change data type of a series in Pandas Use a numpy. dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy. dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.

Can pandas series hold different data types?

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).

Can pandas DataFrame store different data types?

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data. frame in R. The table has 3 columns, each of them with a column label.


2 Answers

Consider using the dtype argument of pandas.DataFrame.to_sql where you pass a dictionary of SQLAlchemy types to named columns:

import sqlalchemy 
...
data.to_sql(name=table_name, con=engine, if_exists='replace', index=False,
            dtype={'name_of_datefld': sqlalchemy.types.DateTime(), 
                   'name_of_intfld': sqlalchemy.types.INTEGER(),
                   'name_of_strfld': sqlalchemy.types.VARCHAR(length=30),
                   'name_of_floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True),
                   'name_of_booleanfld': sqlalchemy.types.Boolean}
like image 180
Parfait Avatar answered Dec 24 '22 23:12

Parfait


I think this has more to do with how pandas handles the table if it exists. The "replace" value to the if_exists argument tells pandas to drop your table and recreate it. But when re-creating your table, it will do it based on its own terms (and the data stored in that particular DataFrame).

While providing column datatypes will work, doing it for every such case might be cumbersome. So I would rather truncate the table in a separate statement and then just append data to it, like so:

Instead of:

df.to_sql(name=table_name, con=engine, if_exists='replace',index=False)

I'd do:

with engine.connect() as con:
    con.execute("TRUNCATE TABLE %s" % table_name)

df.to_sql(name=table_name, con=engine, if_exists='append',index=False)

The truncate statement basically drops and recreates your table too, but it's done internally by the database, and the table gets recreated with the same definition.

like image 20
Bogdan Mircea Avatar answered Dec 25 '22 00:12

Bogdan Mircea