Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas to_sql gives unicode decode error

I have a pandas dataframe I loaded via read_csv that I am trying to push to a database via to_sql when I attempt

df.to_sql("assessmentinfo_pivot", util.ENGINE)

I get back a unicodeDecodeError:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 83-84: ordinal not in range(128)

There is no encoding option for to_sql to specify utf-8 for the to_sql and the Engine was created with encoding set to utf-8

ENGINE = create_engine("mssql+pymssql://" +
                       config.get_local('CEDS_USERNAME') + ':' +
                       config.get_local('CEDS_PASSWORD') + '@' +
                       config.get_local('CEDS_SERVER') + '/' +
                       config.get_local('CEDS_DATABASE'),
                       encoding="utf-8")

Any pandas insight into getting this working properly? most of my searched lead me to people having a similar error for to_csv which is just resolved by adding encoding="utf-8" but that is unfortunately not an option here.

I tried paring the file down but it still gives errors even when stripped down to just the headers: http://pastebin.com/F362xGyP

like image 872
lathomas64 Avatar asked Aug 26 '15 20:08

lathomas64


3 Answers

I experienced the exact same issue with the combination pymysql and pandas.to_sql

Update, here is what worked for me:

Instead of passing the charset as an argument, try attaching it directly to the connection string:

connect_string = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'.format(DB_USER, DB_PASS, DB_HOST, DB_PORT, DATABASE)

The problem seems to happen in pymysql and the cause for the error seemingly is that the encoding you define is not properly forwarded and set when the pymsql connection is set.

For the sake of debugging, I harcoded

encoding = 'utf-8

in the pymysql _do_execute_manyfunction and that explained it to me.

like image 66
alybel Avatar answered Nov 13 '22 02:11

alybel


I experienced a similar problem on python 3.7.: UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 0: character maps to

It was the way I defined my engine. I had charset defined to utf-8 in my engine, yet it did not pick it up:

# Connecting to the database(reference for checkout_listener not added)
def MysqlConnection(DbName):
    DB_TYPE = 'mysql'
    DB_DRIVER = 'mysqldb'
    DB_NAME = DbName
    POOL_SIZE = 100
    CHARSET = 'utf-8'

    SQLALCHEMY_DATABASE_URI = '%s+%s://%s:%s@%s:%s/%s?%s' % (DB_TYPE, DB_DRIVER, DB_USER,
                                                             DB_PASS, DB_HOST, DB_PORT, DB_NAME, CHARSET)
    ENGINE1 = create_engine(
        SQLALCHEMY_DATABASE_URI, pool_size=POOL_SIZE, pool_recycle=3600, echo=False)
    event.listen(ENGINE1, 'checkout', checkout_listener)
    return (ENGINE1);

This worked fine on python 2 but on python 3, the charmap error would occur. The only solution I found was to write engine in a different manner, and add charset to the definition string:

connection_string = f"{mysql_user}:{mysql_password}@localhost:3306/{db_name}?charset=utf8"
engine = create_engine(f'mysql://{connection_string}')
like image 36
user13089205 Avatar answered Nov 13 '22 02:11

user13089205


I have solved the issue changing the character set in MySQL database (UTF-8) and adding this to the pymysql connection: charset='utf8'.

like image 23
David Incio Avatar answered Nov 13 '22 01:11

David Incio