UTF-8 data in Latin1 database: can it be saved?

Tags:

I have a rails app that receives data from an Android device. I noticed that some of the data, when in Japanese, is not saved correctly. It shows up as literal question marks (not the diamond ones) in the MySQL client and in the rails website.

It turns out that the database that I have connected to the rails app is set to Latin1. Rails is set to UTF-8.

I read a lot about character encodings, but they all mention that the data is somehow a bit readable. Mine however is only literal question marks. Also trying to convert the data to UTF-8 using several methods on the web doesn't change a thing. I suspect that the data is converted to question marks when it's written to the database.

Sample output from the MySQL console:

Click to copy

select * from foo where bar = "foobar";
+-------+------+------------------------+---------------------+---------------------+
| id    | name | bar                    | created_at          | updated_at          |
+-------+------+------------------------+---------------------+---------------------+
| 24300 | ???? | foobar                 | 2012-01-23 05:04:22 | 2012-01-23 05:04:22 |
+-------+------+------------------------+---------------------+---------------------+
1 row in set (0.00 sec)

The input data, that my rails app got from the Android client was:

Click to copy

name = 爆笑笑話

This input data has been verified to exist in the rails app before saving to the database. So it's not mangled in the Android client or during transfer to the server. Is there any chance I can get this data back? Or is it completely lost?

917

asked Dec 22 '12 00:12

Peterdk

1 Answers

It's actually very easy to think that data is encoded in one way, when it is actually encoded in some other way: this is because any attempt to directly retrieve the data will result in conversion first to the character set of your database connection and then to the character set of your output medium—therefore you should first verify the actual encoding of your stored data through either SELECT BINARY name FROM foo WHERE bar = 'foobar' or SELECT HEX(name) FROM foo WHERE bar = 'foobar'.

Where the character 爆 is expected, you will likely find either of the following byte sequences:

0xe78886, indicating that your column actually contains UTF-8 encoded data: this usually happens when the character set of the database connection over which the text was originally inserted was set to latin1 but actually UTF-8 encoded data was sent.

You must be seeing ? characters when fetching the data because something between the data storage and the display has been unable to transcode those bytes (however, given that MySQL thinks they represent çˆ† and those characters are likely available in most character sets, it's unlikely that it's occurring within MySQL itself—unless you're explicitly adjusting the encoding information during retrieval).

Anyway, if this is the case, you need to drop the encoding information from the column and then tell MySQL that the data is actually encoded as UTF-8. As documented under ALTER TABLE Syntax:
Warning

The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:

Click to copy
```
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
```
The reason this works is that there is no conversion when you convert to or from BLOB columns.
0x3f, indicating that the database does actually contain the literal character ? and your original data has been lost: this doesn't happen easily, since MySQL usually throws error 1366 if implicit transcoding results in loss of data. Perhaps there was some explicit transcoding in your insert statement?

In this case, you need to convert the storage encoding to a suitable format, then update or re-insert the data:

Click to copy
```
ALTER TABLE foo CONVERT TO utf8;
UPDATE foo SET name = _utf8 '爆笑笑話' WHERE bar = 'foobar';
```

184

answered Oct 15 '22 09:10

eggyal

Related questions
                            
                                Searching MySQL database by a Regex match (in reverse)
                            
                                MySQL query: Join tables and display records as comma separated string in a single row
                            
                                return a default timestamp object instead of null
                            
                                Sql query from related tables
                            
                                How to setup encoding for Bosnian (or Croatian or Slovenian) characters set using MySql and Umbraco 4.7.1
                            
                                Using Hive for real time queries
                            
                                Split last 1MB off a 1GB log file
                            
                                New to C# - trying to write code to do a simple function
                            
                                LEFT Join query issues with
                            
                                Problems with MySQL LOAD XML INFILE
                            
                                load local infile not allowed perl mysql
                            
                                Why we limit length of columns values in MYSQL
                            
                                fullCalendar events post method to MySQL
                            
                                Front-End for MySQL Forms (Windows or OSX)
                            
                                Which type should i use for html text in my database?
                            
                                MYSQL and GPL recommendation? [closed]
                            
                                mysql php selecting fields from 2 tables, with the same field names [duplicate]
                            
                                MySQL unique column string
                            
                                What is the "formal" name of MySQL's DATETIME format?
                            
                                JSON output accents issue

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

UTF-8 data in Latin1 database: can it be saved?

Tags:

mysql

character-encoding

ruby-on-rails

unicode

latin1

Peterdk

People also ask

1 Answers

Warning

eggyal

Recent Activity

Donate For Us