I have a MySQL table properly set to the UTF-8 character set. I suspect some data inserted into one of my columns has been double encoded. I am expecting to see a non-breaking space character (UTF-8 0xC2A0), but what I get when selecting this column out of this table is four octets (0xC3A2 0xC2A0). That's what I would expect to see if at some point somebody had treated an UTF-8 0xC2A0 as ISO-8859-1 then attempted to encode again to UTF-8 before inserting into MySQL.
My test above where I am seeing the four octets involves selecting this column out of MySQL with Perl's DBD::mysql. I'd like to take Perl and DBD::mysql out of the equation to verify that those four octets are actually what MySQL has stored. Is there a way to do this directly with a SQL query?
If you have a blank database, click on the database name in the left hand frame. Select the Table that you want to browse the data from and press the Browse icon. The data in the table will then be displayed.
The VARBINARY type is similar to the VARCHAR type, but stores binary byte strings rather than non-binary character strings. M represents the maximum column length in bytes. It contains no character set, and comparison and sorting are based on the numeric value of the bytes.
TINYTEXT can store up to 255 characters i.e 255 bytes. It is suitable for storing summaries of articles, short notices, captions, etc. It takes 1-Byte overhead. MEDIUMTEXT. MEDIUMTEXT can store up to 16,777,215 characters i.e 16,777,215 bytes or 64MB of data.
mysql> SELECT HEX(name) FROM mytable;
+-----------+
| hex(name) |
+-----------+
| 4142C2A0 |
+-----------+
Why not try the BINARY operator?
"The BINARY operator casts the string following it to a binary string. This is an easy way to force a column comparison to be done byte by byte rather than character by character."
http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With