Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_sql change column type from varchar to text

I wrote a little script to copy a table between SQL servers. It works, but one of the columns changed type from varchar to text... How do I make it to copy a table with the same columns types?

import pymssql
import pandas as pd
from sqlalchemy import create_engine


db_server= 1.2.3.4\\r2
db_database="Test_DB"
db_user="vaf"
db_password="1234"

local_db_server="1.1.1.1\\r2"
local_db_database="Test_DB"
local_db_user="vaf"
local_db_password="1234"

some_query=("""
select * from some_table
""")




def main():
    conn=pymssql.connect(server=local_db_server,user=local_db_user,password=local_db_password,database=local_db_database,charset='UTF-8')
    data=pd.io.sql.read_sql(some_query,conn)

    connection_string='mssql+pymssql://{}:{}@{}/{}'.format(db_user,db_password,db_server,db_database)
    engine=create_engine(connection_string)
    data.to_sql(name="some_table",con=engine,if_exists='replace',index=False)

if __name__ == "__main__":
    main()

Thanks

like image 234
O. San Avatar asked Apr 26 '17 10:04

O. San


People also ask

How do I change the datatype of a column in pandas?

to_numeric() This method is used to convert the data type of the column to the numerical one. As a result, the float64 or int64 will be returned as the new data type of the column based on the values in the column.

How do I change Dtype of a column in pandas to string?

If you want to change the data type for all columns in the DataFrame to the string type, you can use df. applymap(str) or df. astype(str) methods.

How do I convert a column to Dtype?

The dtype specified can be a buil-in Python, numpy , or pandas dtype. Let's suppose we want to convert column A (which is currently a string of type object ) into a column holding integers. To do so, we simply need to call astype on the pandas DataFrame object and explicitly define the dtype we wish to cast the column.

How do I change my pandas type?

You can change the column type in pandas dataframe using the df. astype() method. Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.


1 Answers

Consider three approaches:

SPECIFY TYPES (proactive as it anticipates ahead)

Using the dtype argument of pandas.DataFrame.to_sql, pass a dictionary of sqlalchemy types for named columns.

data.to_sql(name="some_table", con=engine, if_exists='replace', index=False,
            dtype={'datefld': sqlalchemy.DateTime(), 
                   'intfld':  sqlalchemy.types.INTEGER(),
                   'strfld': sqlalchemy.types.VARCHAR(length=255),
                   'floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True),
                   'booleanfld': sqlalchemy.types.Boolean}

DELETE DATA (proactive as it anticipates ahead)

Clean out table with DELETE action query. Then, migrate only the data from pandas to SQL Server without structurally changing table since to_sql replace argument re-creates the table. This approach assumes dataframe is always consistent (no new columns / changed types) with database table.

def main():
   connection_string = 'mssql+pymssql://{}:{}@{}/{}'\
                         .format(db_user,db_password,db_server,db_database)
   engine = create_engine(connection_string)

   # IMPORT DATA INTO DATA FRAME
   data = pd.read_sql(some_query, engine)

   # SQL DELETE (CLEAN OUT TABLE) VIA TRANSACTION
   with engine.begin() as conn:     
      conn.execute("DELETE FROM some_table")

   # MIGRATE DATA INTO DATA FRAME (APPEND NOT REPLACE)
   data.to_sql(name='some_table', con=engine, if_exists='append', index=False)

MODIFY COLUMN (reactive as it fixes ad-hoc)

Alter the column after migration with a DDL SQL statement.

def main():
   connection_string = 'mssql+pymssql://{}:{}@{}/{}'\
                         .format(db_user,db_password,db_server,db_database)
   engine = create_engine(connection_string)

   # IMPORT DATA INTO DATA FRAME
   data = pd.read_sql(some_query, engine)

   # MIGRATE DATA INTO DATA FRAME 
   data.to_sql(name="some_table", con=engine, if_exists='replace', index=False)

   # ALTER COLUMN TYPE (ASSUMING USER HAS RIGHTS/PRIVILEGES)
   with engine.begin() as conn:     
      conn.execute("ALTER TABLE some_table ALTER COLUMN mytextcolumn VARCHAR(255);")

I recommend the second approach as I believe databases should be agnostic to application code like python and pandas. Hence, initial build/re-build of table schema should be a planned, manual process, and no script should structurally change a database on the fly, only interact with data.

like image 140
Parfait Avatar answered Nov 14 '22 10:11

Parfait