<pre class="prettyprint">+--------------------------+--------------------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/local/mysql-5.1.41-osx10.5-x86_64/share/charsets/ | +--------------------------+--------------------------------------------------------+ 8 rows in set (0.00 sec) mysql> select version(); +-----------+ | version() | +-----------+ | 5.1.41 | +-----------+ 1 row in set (0.00 sec) mysql> select char(0x00FC); +--------------+ | char(0x00FC) | +--------------+ | ? | +--------------+ 1 row in set (0.00 sec)</pre> Expecting actual utf8 character --> " ü " instead of " ? " Tried char(0x00FC using utf8) also, but no go. Using mysql version 5.1.41 Been allover the Google, cannot find anything on this. The MySQL docs simply say that multibyte output is expected on values greater than 255, after mysql version 5.0.14. Thanks

You are confusing UTF-8 with Unicode. 0x00FC is the Unicode code point for ü: <pre class="prettyprint"><code>mysql> select char(0x00FC using ucs2); +----------------------+ | char(0x00FC using ucs2) | +----------------------+ | ü | +----------------------+ </code></pre> In UTF-8 encoding, 0x00FC is represented by two bytes: <pre class="prettyprint"><code>mysql> select char(0xC3BC using utf8); +-------------------------+ | char(0xC3BC using utf8) | +-------------------------+ | ü | +-------------------------+ </code></pre> UTF-8 is merely a way of encoding Unicode characters in binary form. It is meant to be space efficient, which is why ASCII characters only take a single byte, and iso-8859-1 characters such as ü only take two bytes. Some other characters take three or four bytes, but they are much less common.

Adding to Martin's answer: <ol> <li> You can use an "introducer" instead of the <code>CHAR()</code> function. To do this, you specify the encoding, prefixed with an underscore, before the code point: <pre class="prettyprint"><code>_utf16 0xFC </code></pre> or: <pre class="prettyprint"><code>_utf16 0x00FC </code></pre> </li> <li> If the goal is to specify the code point instead of the encoded byte sequence, then you need to use an encoding in which the code point value just happens to be the encoded byte sequence. For example, as shown in Martin's answer, <code>0x00FC</code> is both the code point value for <code>ü</code> and the encoded byte sequence for <code>ucs2</code> / <code>utf16</code> (they are effectively the same encoding for BMP characters, but I prefer to use "utf16" as it is consistent with "utf8" and "utf32", consistent in the "utf" theme). But, <code>utf16</code> only works for BMP characters (code points U+0000 - U+FFFF) in terms of specifying the code point value. If you want a Supplementary Character, then you will need to use the <code>utf32</code> encoding. Not only does <code>_utf32 0xFC</code> return <code>ü</code>, but: <pre class="prettyprint"><code>_utf32 0x1F47E </code></pre> returns: 👾 </li> </ol> For more details on these options, plus Unicode escape sequences for other languages and platforms, please see my post: Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)

MySQL CHAR() Function and UTF8 Output?

Tags:

mysql

escaping

unicode

utf-8

string-literals

+--------------------------+--------------------------------------------------------+
| Variable_name            | Value                                                  |
+--------------------------+--------------------------------------------------------+
| character_set_client     | utf8                                                   |
| character_set_connection | utf8                                                   |
| character_set_database   | utf8                                                   |
| character_set_filesystem | binary                                                 |
| character_set_results    | utf8                                                   |
| character_set_server     | utf8                                                   |
| character_set_system     | utf8                                                   |
| character_sets_dir       | /usr/local/mysql-5.1.41-osx10.5-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.41    |
+-----------+
1 row in set (0.00 sec)

mysql> select char(0x00FC);
+--------------+
| char(0x00FC) |
+--------------+
| ?            |
+--------------+
1 row in set (0.00 sec)

Expecting actual utf8 character --> " ü " instead of " ? " Tried char(0x00FC using utf8) also, but no go.

Using mysql version 5.1.41

Been allover the Google, cannot find anything on this. The MySQL docs simply say that multibyte output is expected on values greater than 255, after mysql version 5.0.14.

Thanks

666

asked Mar 05 '10 02:03

jason

2 Answers

You are confusing UTF-8 with Unicode.

0x00FC is the Unicode code point for ü:

mysql> select char(0x00FC using ucs2);
+----------------------+
| char(0x00FC using ucs2) |
+----------------------+
| ü                   | 
+----------------------+

In UTF-8 encoding, 0x00FC is represented by two bytes:

mysql> select char(0xC3BC using utf8);
+-------------------------+
| char(0xC3BC using utf8) |
+-------------------------+
| ü                      | 
+-------------------------+

UTF-8 is merely a way of encoding Unicode characters in binary form. It is meant to be space efficient, which is why ASCII characters only take a single byte, and iso-8859-1 characters such as ü only take two bytes. Some other characters take three or four bytes, but they are much less common.

answered Sep 18 '22 12:09

Martin

Adding to Martin's answer:

You can use an "introducer" instead of the CHAR() function. To do this, you specify the encoding, prefixed with an underscore, before the code point:
```
_utf16 0xFC
```
or:
```
_utf16 0x00FC
```
If the goal is to specify the code point instead of the encoded byte sequence, then you need to use an encoding in which the code point value just happens to be the encoded byte sequence. For example, as shown in Martin's answer, 0x00FC is both the code point value for ü and the encoded byte sequence for ucs2 / utf16 (they are effectively the same encoding for BMP characters, but I prefer to use "utf16" as it is consistent with "utf8" and "utf32", consistent in the "utf" theme).

But, utf16 only works for BMP characters (code points U+0000 - U+FFFF) in terms of specifying the code point value. If you want a Supplementary Character, then you will need to use the utf32 encoding. Not only does _utf32 0xFC return ü, but:
```
_utf32 0x1F47E
```
returns: 👾

For more details on these options, plus Unicode escape sequences for other languages and platforms, please see my post:

Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)

answered Sep 22 '22 12:09

Solomon Rutzky

Related questions
                            
                                apache airflow initdb fails at kubernetes_resource_checkingpoint for mysql
                            
                                How to recover MySQL data from Docker container
                            
                                How do I get "time ago" with a date provided by mysql? [closed]
                            
                                Bigger than a char but smaller than a blob
                            
                                Handling MySQL Full Text Special Characters
                            
                                Are there JavaScript bindings for MySQL?
                            
                                Can you have multiple MySqlCommand's in a single transaction?
                            
                                What is the best approach to monitor site performance in rails
                            
                                Database Design - Best way to show available hours?
                            
                                Migrating MySQL to a table with different structure
                            
                                What does Coercibility mean? MySQL User Variables
                            
                                How to get a rough estimate of LAMP application capacity?
                            
                                Implementing a data history / versioning solution for a Hibernate-based application (with a twist)
                            
                                Databases: Making a Log of actions, how to handle various references?
                            
                                What's the impact of NULL on MySQL tables? (InnoDB)
                            
                                Are databases always the solution in web data storage?
                            
                                How to build a 'related questions' engine?
                            
                                Why is MySQL InnoDB so much slower at full table scans than MyISAM?
                            
                                MySQL reverse order without DESC
                            
                                Private messaging system, large single table versus many small tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With