Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode (hexadecimal) character literals in MySQL

Is there a way to specify Unicode character literals in MySQL?

I want to replace a Unicode character with an Ascii character, something like the following:

Update MyTbl Set MyFld = Replace(MyFld, "ẏ", "y")

But I'm using even more obscure characters which are not available in most fonts, so I want to be able to use Unicode character literals, something like

Update MyTbl Set MyFld = Replace(MyFld, "\u1e8f", "y")

This SQL statement is being invoked from a PHP script - the first form is not only unreadable, but it doesn't actually work!

like image 901
ChrisV Avatar asked Nov 23 '10 13:11

ChrisV


People also ask

What is a Unicode character literal?

A Unicode delimited character literal can consist of a maximum of 31000 Unicode_string_body characters. The data type of Unicode delimited character literals is VARCHAR(n) CHARACTER SET UNICODE, where n is the resolved length of the literal in Unicode characters.

Does MySQL support Unicode?

MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. utf8 : An alias for utf8mb3 .

What is hex in MySQL?

HEX() : This function in MySQL is used to return an equivalent hexadecimal string value of a string or numeric Input. If the input is a string then each byte of each character in the string is converted to two hexadecimal digits.


3 Answers

You can specify hexadecimal literals (or even binary literals) using 0x, x'', or X'':

select  0xC2A2;
select x'C2A2';
select X'C2A2';

But be aware that the return type is a binary string, so each and every byte is considered a character. You can verify this with char_length:

select char_length(0xC2A2)

2

If you want UTF-8 strings instead, you need to use convert:

select convert(0xC2A2 using utf8mb4)

And we can see that C2 A2 is considered 1 character in UTF-8:

select char_length(convert(0xC2A2 using utf8mb4))

1


Also, you don't have to worry about invalid bytes because convert will remove them automatically:

select char_length(convert(0xC1A2 using utf8mb4))

0

As can be seen, the output is 0 because C1 A2 is an invalid UTF-8 byte sequence.

like image 143
Pacerier Avatar answered Oct 24 '22 10:10

Pacerier


Thanks for your suggestions, but I think the problem was further back in the system.

There's a lot of levels to unpick, but as far as I can tell, (on this server at least) the command

set names utf8

makes the utf-8 handling work correctly, whereas

set character set utf8

doesn't.

In my environment, these are being called from PHP using PDO, for what difference that may make.

Thanks anyway!

like image 32
ChrisV Avatar answered Oct 24 '22 11:10

ChrisV


You can use the hex and unhex functions, e.g.:

update mytable set myfield = unhex(replace(hex(myfield),'C383','C3'))
like image 3
borrible Avatar answered Oct 24 '22 11:10

borrible