Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql: find cyrillic or greek text in database

I have a UTF8 table in MySQL containing names, with all types of text (numeric, capitals, greek, cycrillic etc).

---------------
ID   Name
---------------
001  Jane Smith
002  John Doe
003  Джемз Смис
004  Пэтра Смис
005  "Groove" Holme
006  99er Dude

How can I select only the cyrillic names? (records 003 and 004)

EDIT

Thanks for the answer below, which looked like it would be correct, but didn't work. More research turned up this in the documentation:

Warning

The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

EDIT EDIT, A SOLUTION

I solved this by adding an extra field to my database which stores the script type, eg Cyrillic, Thai etc. Then a ran a batch process in PHP that detects the script and stores the information in the database.

To detect the script in PHP, use Unicode regex functions. See this page:

http://www.regular-expressions.info/unicode.html

like image 336
soupagain Avatar asked Jan 17 '23 20:01

soupagain


1 Answers

I don't know if it covers all characters, but you should be able to use this:

SELECT * FROM table WHERE Name REGEXP '[Α-Ωα-ωА-Яа-я]'

which will return all entries containing at least one cyrillic or greek character

EDIT

Check "Warning" in OP's question about REGEXP/RLIKE working in byte-wise fashion so they might not work for multi-byte character sets.

like image 145
boobiq Avatar answered Jan 29 '23 10:01

boobiq