Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check whether given text is english or chinese in android?

Tags:

android

I am designing one android application in English and Chinese both. I want to know whether the user type English text or Chinese text?. Is there any way to check this in android?

like image 500
Yatish Bathla Avatar asked Jul 10 '14 09:07

Yatish Bathla


People also ask

How do you identify Chinese characters?

Optical character recognition (OCR) – Many apps and websites provide OCR features where you can scan or take pictures of the character(s) you want to look up. Google Docs has such a feature and there are others online you can easily find by searching for “Chinese” and “OCR”.


2 Answers

If you want to detect whether the input string contains Chinese-like character(s) (CJK), the following may help you:

public static boolean isCJK(String str){
        int length = str.length();
        for (int i = 0; i < length; i++){
            char ch = str.charAt(i);
            Character.UnicodeBlock block = Character.UnicodeBlock.of(ch);
            if (Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS.equals(block)|| 
                Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS.equals(block)|| 
                Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A.equals(block)){
                return true;
            }
        }
        return false;
    }
like image 111
hungr Avatar answered Sep 24 '22 15:09

hungr


The accepted answer is either incomplete or outdated. Here are a few methods you can use to test if a character is a CJK Ideograph. My fuller answer is here.

It is better to use the codepoint rather than charAt (as in the accepted answer) because many Chinese characters are in a higher code plane. Using charAt will just give you one of the surrogate pairs rather than the actual Chinese character. So a better way to loop through a String is like this:

final int length = myString.length();
for (int offset = 0; offset < length; ) {
    final int codepoint = Character.codePointAt(myString, offset);

    // use codepoint here

    offset += Character.charCount(codepoint);
}

And testing the codepoints can be done in one of the following ways.

private boolean isCJK(int codepoint) {
    Character.UnicodeBlock block = Character.UnicodeBlock.of(codepoint);
    return (Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS.equals(block)||
            Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A.equals(block) ||
            Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B.equals(block) ||
            Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C.equals(block) || // api 19, remove these if supporting lower versions
            Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D.equals(block) || // api 19
            Character.UnicodeBlock.CJK_COMPATIBILITY.equals(block) ||
            Character.UnicodeBlock.CJK_COMPATIBILITY_FORMS.equals(block) ||
            Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS.equals(block) ||
            Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT.equals(block) ||
            Character.UnicodeBlock.CJK_RADICALS_SUPPLEMENT.equals(block) ||
            Character.UnicodeBlock.CJK_STROKES.equals(block) ||                        // api 19
            Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION.equals(block) ||
            Character.UnicodeBlock.ENCLOSED_CJK_LETTERS_AND_MONTHS.equals(block) ||
            Character.UnicodeBlock.ENCLOSED_IDEOGRAPHIC_SUPPLEMENT.equals(block) ||    // api 19
            Character.UnicodeBlock.KANGXI_RADICALS.equals(block) ||
            Character.UnicodeBlock.IDEOGRAPHIC_DESCRIPTION_CHARACTERS.equals(block));
}

Or for API 19

private boolean isCJK(int codepoint) {
    return Character.isIdeographic(codepoint);
}

Or for API 24

private boolean isCJK(int codepoint) {
    return (Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN);
}
like image 34
Suragch Avatar answered Sep 24 '22 15:09

Suragch