Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uploading Python Pandas dataframe to MySQL - InternalError: 1366, "Incorrect String Value"

I am trying to write a Pandas dataframe into MySQL database, and have used the following code:

engine = sqlalchemy.create_engine("mysql+pymysql://root:password@localhost/skills?charset=utf8mb4")
connection = engine.connect
dataframe.head().to_sql('indeed_resumes', engine, flavor='mysql', if_exists='replace',index=True)

However, I get the following error:

InternalError: (1366, "Incorrect string value: '\\xE1\\xBB\\x99i\\x0AO...' for column 'work' at row 5")

The datatype of the MySQL table is as follows:

 mysql> desc indeed_resumes;
    +-----------+------------+------+-----+---------+-------+
    | Field     | Type       | Null | Key | Default | Extra |
    +-----------+------------+------+-----+---------+-------+
    | index     | bigint(20) | YES  | MUL | NULL    |       |
    | certs     | text       | YES  |     | NULL    |       |
    | contact   | text       | YES  |     | NULL    |       |
    | education | text       | YES  |     | NULL    |       |
    | headline  | text       | YES  |     | NULL    |       |
    | info      | text       | YES  |     | NULL    |       |
    | skills    | text       | YES  |     | NULL    |       |
    | summary   | text       | YES  |     | NULL    |       |
    | updated   | text       | YES  |     | NULL    |       |
    | work      | text       | YES  |     | NULL    |       |
    +-----------+------------+------+-----+---------+-------+
    10 rows in set (0.00 sec)

My data consists of very long strings (sometimes around 3000 characters), so this could be causing the error. Any suggestions?

like image 212
runawaykid Avatar asked Dec 16 '15 06:12

runawaykid


1 Answers

I seem to have resolved this issue. It looks like I also needed to change the database encoding using the following commands.

ALTER DATABASE skills CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE indeed_resumes CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

From https://mathiasbynens.be/notes/mysql-utf8mb4:

"Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported.

Luckily, MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 which maps to proper UTF-8 and thus fully supports Unicode, including astral symbols."

like image 92
runawaykid Avatar answered Sep 28 '22 05:09

runawaykid