I've got an input box that allows UTF8 characters -- can I detect whether the characters are in Chinese, Japanese, or Korean programmatically (part of some Unicode range, perhaps)? I would change search methods depending on if MySQL's fulltext searching would work (it won't work for CJK characters).
Thanks!
// is chinese, japanese or korean language
function isCjk($string) {
return isChinese($string) || isJapanese($string) || isKorean($string);
}
function isChinese($string) {
return preg_match("/\p{Han}+/u", $string);
}
function isJapanese($string) {
return preg_match('/[\x{4E00}-\x{9FBF}\x{3040}-\x{309F}\x{30A0}-\x{30FF}]/u', $string);
}
function isKorean($string) {
return preg_match('/[\x{3130}-\x{318F}\x{AC00}-\x{D7AF}]/u', $string);
}
CJK characters are restricted to certain Unicode Blocks. You need to check the characters if they are inside these blocks, and should consider surrogates (32bit characters) too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With