I'm working with data from an old mysql database. There's a table in this database with a string column that has its encoding set to "cp1252 West European (latin1)" (same as Windows-1252). When querying the data from mysql command prompt, data from this field is represented as:
Obama’s
This is supposed to read
Obama’s
I've tried following the accepted answer for How to convert an entire MySQL database characterset and collation to UTF-8? to convert the field to UTF-8 in MySQL, but it makes no difference.
I also tried inserting a new row into that table, using Obama’s
as the text for that field (again, from the mysql command prompt). However, this text is correctly represented when I then query the same row I just inserted. I tried performing that insertion both when the field was set to latin1 and when it was set to UTF-8. Same result.
This leads me to believe that when the bad data was inserted into the database, it was first incorrectly encoded by PHP. This is where it gets fuzzy to me.
I can assume that the data was inserted via a web form and processed with PHP. What did PHP do with it before inserting it into the database? Did it convert the string to UTF-8, which according to the table on this helpful page, uses the three bytes %E2 %80 %99
to represent the ’
character. Do I have that right?
If that's correct, what are my options to repair this data? I'd like to convert the table and its fields to UTF-8 encodings, but that doesn't seem to fix the text. Do I have to write a script that manually changes those characters to what they should be?
select convert(binary convert(field_name using latin1) using utf8) from table_name
If this displays correctly you can do update.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With