Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does MySQL Regexp support Unicode matching

Does anyone know if Mysql's regexp supports unicode? I've been doing some research and the majority of blogs etc. seem to indicate that there is a problem or its not supported. I'm wondering then is it best to use LIKE for unicode pattern matching and regexp for ASCII enhanced pattern matching?

I Like the idea of being able to search for matches at the beginning or end of a string, but if regexp doesn't support unicode then this could be difficult if my text is unicode.

like image 538
user1236443 Avatar asked Jan 16 '13 10:01

user1236443


2 Answers

  1. Does anyone know if Mysql's regexp supports unicode? I've been doing some research and the majority of blogs etc. seem to indicate that there is a problem or its not supported.

    As documented under Regular Expressions:

    Warning

    The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

  2. I'm wondering then is it best to use LIKE for unicode pattern matching and regexp for ASCII enhanced pattern matching?

    Yes, that would be best.

  3. I Like the idea of being able to search for matches at the beginning or end of a string, but if regexp doesn't support unicode then this could be difficult if my text is unicode.

    One can do that with LIKE too:

    WHERE foo LIKE 'bar%'
    

    And:

    WHERE foo LIKE '%bar'
    
like image 188
eggyal Avatar answered Sep 21 '22 06:09

eggyal


MariaDB starting with 10.0.5 :

REGEXP/RLIKE, and the new functions REGEXP_REPLACE(), REGEXP_INSTR() and REGEXP_SUBSTR(), now work correctly with all multi-byte character sets supported by MariaDB, including East-Asian character sets (big5, gb2313, gbk, eucjp, eucjpms, cp932, ujis, euckr), and Unicode character sets (utf8, utf8mb4, ucs2, utf16, utf16le, utf32). In earlier versions of MariaDB (and all MySQL versions) REGEXP/RLIKE works correctly only with 8-bit character sets.

like image 34
Rick James Avatar answered Sep 23 '22 06:09

Rick James