Using Java how to detect if a String contains Chinese characters?
String chineseStr = "已下架" ;
if (isChineseString(chineseStr)) {
System.out.println("The string contains Chinese characters");
}else{
System.out.println("The string contains Chinese characters");
}
Can you please help me to solve the problem?
isIdeographic(int codepoint) would tell wether the codepoint is a CJKV (Chinese, Japanese, Korean and Vietnamese) ideograph. Nearer is using Character. UnicodeScript. HAN.
Optical character recognition (OCR) – Many apps and websites provide OCR features where you can scan or take pictures of the character(s) you want to look up. Google Docs has such a feature and there are others online you can easily find by searching for “Chinese” and “OCR”.
Now Character.isIdeographic(int codepoint)
would tell wether the codepoint is a CJKV (Chinese, Japanese, Korean and Vietnamese) ideograph.
Nearer is using Character.UnicodeScript.HAN.
So:
System.out.println(containsHanScript("xxx已下架xxx"));
public static boolean containsHanScript(String s) {
for (int i = 0; i < s.length(); ) {
int codepoint = s.codePointAt(i);
i += Character.charCount(codepoint);
if (Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN) {
return true;
}
}
return false;
}
Or in java 8:
public static boolean containsHanScript(String s) {
return s.codePoints().anyMatch(
codepoint ->
Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN);
}
A more direct approach:
if ("粽子".matches("[\\u4E00-\\u9FA5]+")) {
System.out.println("is Chinese");
}
If you also need to catch rarely used and exotic characters then you'll need to add all the ranges: What's the complete range for Chinese characters in Unicode?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With