Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hex characters in regexp matching in mysql

Tags:

regex

mysql

I've spot very odd behavior of mysql. The select below returns 0:

SELECT CONVERT('a' USING BINARY) REGEXP '[\x61]'

However semantically identical select below returns 1:

SELECT CONVERT('a' USING BINARY) REGEXP '[\x61-\x61]'

Do you know what is happening here? I've tested that in mysql 5.0.0.3031 and 4.1.22

I need the hex characters to create a regexp that match when a binary string is encoded in utf8. A perl version of such regexp can be found on w3c site. It looks as follow:

$field =~
      m/\A(
         [\x09\x0A\x0D\x20-\x7E]            # ASCII
       | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
       |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
       | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
       |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
       |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
       | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
       |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
      )*\z/x;
like image 553
Piotr Czapla Avatar asked Feb 04 '10 12:02

Piotr Czapla


2 Answers

This matches too:

SELECT CONVERT('a' USING BINARY) REGEXP '[1-\x]'

The reason is that \x is interpeted as x and a comes between 1 and x. The rest of your regex is just ordinary characters that aren't relevant here because they're already inside the [1-x] range.

SELECT CONVERT('0' USING BINARY) REGEXP '[\x61-\x61]' -- Fails, because 0 < 1.
SELECT CONVERT('1' USING BINARY) REGEXP '[\x61-\x61]' -- Succeeds: inside [1-x].
SELECT CONVERT('2' USING BINARY) REGEXP '[\x61-\x61]' -- Succeeds: inside [1-x].
...
SELECT CONVERT('w' USING BINARY) REGEXP '[\x61-\x61]' -- Succeeds: inside [1-x].
SELECT CONVERT('x' USING BINARY) REGEXP '[\x61-\x61]' -- Succeeds: inside [1-x].
SELECT CONVERT('y' USING BINARY) REGEXP '[\x61-\x61]' -- Fails, because y > x.

I'm not sure what you're trying to achieve, but if you want hex characters, you can use the hex function:

SELECT HEX('a')
61
like image 111
Mark Byers Avatar answered Sep 19 '22 14:09

Mark Byers


to write a regexp like [\x61-\x65] in mysql, you can use hex values inside a concat:

SELECT CONVERT('a' USING BINARY) REGEXP CONCAT('[', 0x61, '-', 0x65, ']')
like image 36
Puggan Se Avatar answered Sep 19 '22 14:09

Puggan Se