Has anyone experienced this before? I have a table with "int" and "varchar" columns - a report schedule table. I am trying to import an excel file with ".xls" extension to this table using a python program. I am using pandas to_sql to read in 1 row of data. Data imported is 1 row 11 columns. Import works successfully but after the import I noticed that the datatypes in the original table have now been altered from: <pre class="prettyprint"><code> int --> bigint char(1) --> varchar(max) varchar(30) --> varchar(max) </code></pre> Any idea how I can prevent this? The switch in datatypes is causing issues in downstrean routines. <pre class="prettyprint"><code> df = pd.read_excel(schedule_file,sheet_name='Schedule') params = urllib.parse.quote_plus(r'DRIVER={SQL Server};SERVER=<<IP>>;DATABASE=<<DB>>;UID=<<UDI>>;PWD=<<PWD>>') conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params) engine = create_engine(conn_str) table_name='REPORT_SCHEDULE' df.to_sql(name=table_name,con=engine, if_exists='replace',index=False) </code></pre> TIA

I think this has more to do with how pandas handles the table if it exists. The "replace" value to the if_exists argument tells pandas to drop your table and recreate it. But when re-creating your table, it will do it based on its own terms (and the data stored in that particular DataFrame). While providing column datatypes will work, doing it for every such case might be cumbersome. So I would rather truncate the table in a separate statement and then just append data to it, like so: Instead of: <pre class="prettyprint"><code>df.to_sql(name=table_name, con=engine, if_exists='replace',index=False) </code></pre> I'd do: <pre class="prettyprint"><code>with engine.connect() as con: con.execute("TRUNCATE TABLE %s" % table_name) df.to_sql(name=table_name, con=engine, if_exists='append',index=False) </code></pre> The truncate statement basically drops and recreates your table too, but it's done internally by the database, and the table gets recreated with the same definition.

Pandas to_sql changing datatype in database table

Tags:

python

sql-server

pandas

Has anyone experienced this before?

I have a table with "int" and "varchar" columns - a report schedule table.

I am trying to import an excel file with ".xls" extension to this table using a python program. I am using pandas to_sql to read in 1 row of data.

Data imported is 1 row 11 columns.

Import works successfully but after the import I noticed that the datatypes in the original table have now been altered from:

        int --> bigint
        char(1) --> varchar(max)
        varchar(30) --> varchar(max)

Any idea how I can prevent this? The switch in datatypes is causing issues in downstrean routines.

   df = pd.read_excel(schedule_file,sheet_name='Schedule')
   params = urllib.parse.quote_plus(r'DRIVER={SQL Server};SERVER=<<IP>>;DATABASE=<<DB>>;UID=<<UDI>>;PWD=<<PWD>>')
   conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
   engine = create_engine(conn_str)

   table_name='REPORT_SCHEDULE'
   df.to_sql(name=table_name,con=engine, if_exists='replace',index=False)

TIA

557

asked Nov 14 '18 06:11

Lucky

2 Answers

Consider using the dtype argument of pandas.DataFrame.to_sql where you pass a dictionary of SQLAlchemy types to named columns:

import sqlalchemy 
...
data.to_sql(name=table_name, con=engine, if_exists='replace', index=False,
            dtype={'name_of_datefld': sqlalchemy.types.DateTime(), 
                   'name_of_intfld': sqlalchemy.types.INTEGER(),
                   'name_of_strfld': sqlalchemy.types.VARCHAR(length=30),
                   'name_of_floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True),
                   'name_of_booleanfld': sqlalchemy.types.Boolean}

180

answered Dec 24 '22 23:12

Parfait

I think this has more to do with how pandas handles the table if it exists. The "replace" value to the if_exists argument tells pandas to drop your table and recreate it. But when re-creating your table, it will do it based on its own terms (and the data stored in that particular DataFrame).

While providing column datatypes will work, doing it for every such case might be cumbersome. So I would rather truncate the table in a separate statement and then just append data to it, like so:

Instead of:

df.to_sql(name=table_name, con=engine, if_exists='replace',index=False)

I'd do:

with engine.connect() as con:
    con.execute("TRUNCATE TABLE %s" % table_name)

df.to_sql(name=table_name, con=engine, if_exists='append',index=False)

The truncate statement basically drops and recreates your table too, but it's done internally by the database, and the table gets recreated with the same definition.

answered Dec 25 '22 00:12

Bogdan Mircea

Related questions
                            
                                Find the nearest point in distance for all the points in the dataset - Python
                            
                                Filter Pandas Data Frame based on exact string match
                            
                                Most Pythonic way to print a newline
                            
                                Pycharm/IntelliJ shows 0% coverage for pytest even though coverage was generated
                            
                                Pandas Dataframe sort by a column [duplicate]
                            
                                python: 'dict' object has no attribute 'has_key'
                            
                                Convert a Pandas DataFrame into a single row DataFrame
                            
                                How do you split a Django queryset without evaluating it?
                            
                                Python: Counting occurrences of List element within List
                            
                                How to send message in specific time TelegramBot
                            
                                Percentage of a multiindex in Pandas
                            
                                Pandas validate date format
                            
                                pandas: return average of multiple columns
                            
                                Which encoding is used for strings in Python 2.x?
                            
                                how to solve TypeError: 'float' object is not iterable
                            
                                Pythons Console Module has made it impossible to type the tab key
                            
                                How to re-assign a variable in python without changing its id?
                            
                                Iterating over rows in pandas to check the condition
                            
                                PySpark Will not start - ‘python’: No such file or directory
                            
                                Python - How to - Big Query asynchronous tasks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With