Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect Chinese character in java

Using Java how to detect if a String contains Chinese characters?

    String chineseStr = "已下架" ;

if (isChineseString(chineseStr)) {
  System.out.println("The string contains Chinese characters");
}else{
  System.out.println("The string contains Chinese characters");
}

Can you please help me to solve the problem?

like image 413
Ran Deloun Avatar asked Oct 14 '14 09:10

Ran Deloun


People also ask

How to check the string contains Chinese characters in Java?

isIdeographic(int codepoint) would tell wether the codepoint is a CJKV (Chinese, Japanese, Korean and Vietnamese) ideograph. Nearer is using Character. UnicodeScript. HAN.

How do you identify Chinese characters?

Optical character recognition (OCR) – Many apps and websites provide OCR features where you can scan or take pictures of the character(s) you want to look up. Google Docs has such a feature and there are others online you can easily find by searching for “Chinese” and “OCR”.


2 Answers

Now Character.isIdeographic(int codepoint) would tell wether the codepoint is a CJKV (Chinese, Japanese, Korean and Vietnamese) ideograph.

Nearer is using Character.UnicodeScript.HAN.

So:

System.out.println(containsHanScript("xxx已下架xxx"));

public static boolean containsHanScript(String s) {
    for (int i = 0; i < s.length(); ) {
        int codepoint = s.codePointAt(i);
        i += Character.charCount(codepoint);
        if (Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN) {
            return true;
        }
    }
    return false;
}

Or in java 8:

public static boolean containsHanScript(String s) {
    return s.codePoints().anyMatch(
            codepoint ->
            Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN);
}
like image 191
Joop Eggen Avatar answered Sep 19 '22 20:09

Joop Eggen


A more direct approach:

if ("粽子".matches("[\\u4E00-\\u9FA5]+")) {
    System.out.println("is Chinese");
}

If you also need to catch rarely used and exotic characters then you'll need to add all the ranges: What's the complete range for Chinese characters in Unicode?

like image 45
ccpizza Avatar answered Sep 19 '22 20:09

ccpizza