Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Convert latin1 data to UTF8

I imported some data using LOAD DATA INFILE into a MySQL Database. The table itself and the columns are using the UTF8 character set, but the default character set of the database is latin 1. Because the default character type of the database is latin1, and I used LOAD DATA INFILE without specifying a character set, it interpreted the file as latin1, even though the data in the file was UTF8. Now I have a bunch of badly encoded data in my UTF8 colum. I found this article which seems to address a similar problem, which is "UTF8 inserted in cp1251", but my problem is "Latin1 inserted in UTF8". I've tried editing the queries there to convert the latin1 data to UTF8, but can't get it to work. Either the data comes out the same, or even more mangled than before. Just as an example, the word Québec is showing as Québec.

[ADDITIONAL INFO]

When Selecting the data wrapped in HEX(), Québec has the value 5175C383C2A9626563.

The Create Table (shortened) of this table is.

CREATE TABLE MyDBName.`MyTableName`
(
`ID` INT NOT NULL AUTO_INCREMENT, 
.......
`City` CHAR(32) NULL, 
.......
`)) ENGINE InnoDB CHARACTER SET utf8;
like image 211
Kibbee Avatar asked Sep 17 '09 19:09

Kibbee


2 Answers

I've had cases like this in old wordpress installations with the problem being that the data itself was already in UTF-8 within a Latin1 database (due to WP default charset). This means there was no real need for conversion of the data but the ddbb and table formats. In my experience things get messed up when doing the dump as I understand MySQL will use the client's default character set which in many cases is now UTF-8. Therefore making sure that exporting with the same coding of the data is very important. In case of Latin1 DDBB with UTF-8 coding:

$ mysqldump –default-character-set=latin1 –databases wordpress > m.sql

Then replace the Latin1 references within the exported dump before reimporting to a new database in UTF-8. Sort of:

$ replace "CHARSET=latin1" "CHARSET=utf8" \
    "SET NAMES latin1" "SET NAMES utf8" < m.sql > m2.sql

In my case this link was of great help. Commented here in spanish.

like image 114
luison Avatar answered Oct 02 '22 04:10

luison


Though it is hardly still actual for the OP, I happen to have found a solution in MySQL documentation for ALTER TABLE. I post it here just for future reference:

Warning

The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:

ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;

The reason this works is that there is no conversion when you convert to or from BLOB columns.

like image 21
newtover Avatar answered Oct 02 '22 02:10

newtover