Does anyone know if Mysql's regexp supports unicode? I've been doing some research and the majority of blogs etc. seem to indicate that there is a problem or its not supported. I'm wondering then is it best to use LIKE for unicode pattern matching and regexp for ASCII enhanced pattern matching?
I Like the idea of being able to search for matches at the beginning or end of a string, but if regexp doesn't support unicode then this could be difficult if my text is unicode.
Does anyone know if Mysql's regexp supports unicode? I've been doing some research and the majority of blogs etc. seem to indicate that there is a problem or its not supported.
As documented under Regular Expressions:
Warning
The
REGEXP
andRLIKE
operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
I'm wondering then is it best to use LIKE for unicode pattern matching and regexp for ASCII enhanced pattern matching?
Yes, that would be best.
I Like the idea of being able to search for matches at the beginning or end of a string, but if regexp doesn't support unicode then this could be difficult if my text is unicode.
One can do that with LIKE
too:
WHERE foo LIKE 'bar%'
And:
WHERE foo LIKE '%bar'
MariaDB starting with 10.0.5 :
REGEXP/RLIKE, and the new functions REGEXP_REPLACE(), REGEXP_INSTR() and REGEXP_SUBSTR(), now work correctly with all multi-byte character sets supported by MariaDB, including East-Asian character sets (big5, gb2313, gbk, eucjp, eucjpms, cp932, ujis, euckr), and Unicode character sets (utf8, utf8mb4, ucs2, utf16, utf16le, utf32). In earlier versions of MariaDB (and all MySQL versions) REGEXP/RLIKE works correctly only with 8-bit character sets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With