Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect CJK characters in PHP

I've got an input box that allows UTF8 characters -- can I detect whether the characters are in Chinese, Japanese, or Korean programmatically (part of some Unicode range, perhaps)? I would change search methods depending on if MySQL's fulltext searching would work (it won't work for CJK characters).

Thanks!

like image 852
atp Avatar asked Apr 08 '10 09:04

atp


2 Answers

// is chinese, japanese or korean language
function isCjk($string) {
    return isChinese($string) || isJapanese($string) || isKorean($string);
}

function isChinese($string) {
    return preg_match("/\p{Han}+/u", $string);
}

function isJapanese($string) {
    return preg_match('/[\x{4E00}-\x{9FBF}\x{3040}-\x{309F}\x{30A0}-\x{30FF}]/u', $string);
}

function isKorean($string) {
    return preg_match('/[\x{3130}-\x{318F}\x{AC00}-\x{D7AF}]/u', $string);
}
like image 174
Mantas D Avatar answered Sep 19 '22 17:09

Mantas D


CJK characters are restricted to certain Unicode Blocks. You need to check the characters if they are inside these blocks, and should consider surrogates (32bit characters) too.

like image 23
devio Avatar answered Sep 21 '22 17:09

devio