Problem trimming Japanese string in java

Question

I have the following string (japanese) "　ユーザー名" , the first character is "like" whitespace but its number in unicode is 12288, so if I do "　ユーザー名".trim() I get the same string (trim doesn't work). If i do trim in c++ it works ok. Does anyone know how to solve this issue in java? Is there a special trim method for unicode?

Fabian Steeg · Accepted Answer

As an alternative to the StringUtils class mentioned by Mike, you can also use a Unicode-aware regular expression, using only Java's own libraries:

"　ユーザー名".replaceAll("\p{Z}", "")

Or, to really only trim, and not remove whitespace inside the string:

"　ユーザ ー名 ".replaceAll("(^\p{Z}+|\p{Z}+$)", "")

McDowell · Answer

Have a look at Unicode Normalization and the Normalizer class. The class is new in Java 6, but you'll find an equivalent version in the ICU4J library if you're on an earlier JRE.

    int character = 12288;
    char[] ch = Character.toChars(character);
    String input = new String(ch);
    String normalized = Normalizer.normalize(input, Normalizer.Form.NFKC);

    System.out.println("Hex value:	" + Integer.toHexString(character));
    System.out.println("Trimmed length           :	"
            + input.trim().length());
    System.out.println("Normalized trimmed length:	"
            + normalized.trim().length());

Problem trimming Japanese string in java

Tags:

java

string

nlp

Pablo Retyk

2 Answers

Fabian Steeg

McDowell

Recent Activity

Donate For Us

Problem trimming Japanese string in java

Tags:

java

string

nlp

Pablo Retyk

2 Answers

Fabian Steeg

McDowell

Related questions

Recent Activity

Donate For Us