Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert wrongly encoded data to UTF-8?

I'm working with data from an old mysql database. There's a table in this database with a string column that has its encoding set to "cp1252 West European (latin1)" (same as Windows-1252). When querying the data from mysql command prompt, data from this field is represented as:

Obama’s

This is supposed to read

Obama’s

I've tried following the accepted answer for How to convert an entire MySQL database characterset and collation to UTF-8? to convert the field to UTF-8 in MySQL, but it makes no difference.

I also tried inserting a new row into that table, using Obama’s as the text for that field (again, from the mysql command prompt). However, this text is correctly represented when I then query the same row I just inserted. I tried performing that insertion both when the field was set to latin1 and when it was set to UTF-8. Same result.

This leads me to believe that when the bad data was inserted into the database, it was first incorrectly encoded by PHP. This is where it gets fuzzy to me.

I can assume that the data was inserted via a web form and processed with PHP. What did PHP do with it before inserting it into the database? Did it convert the string to UTF-8, which according to the table on this helpful page, uses the three bytes %E2 %80 %99 to represent the character. Do I have that right?

If that's correct, what are my options to repair this data? I'd like to convert the table and its fields to UTF-8 encodings, but that doesn't seem to fix the text. Do I have to write a script that manually changes those characters to what they should be?

like image 686
Brian Avatar asked Nov 22 '13 18:11

Brian


1 Answers

select convert(binary convert(field_name using latin1) using utf8) from table_name

If this displays correctly you can do update.

like image 55
Imre L Avatar answered Sep 26 '22 06:09

Imre L