Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split UTF-16 String into single chars/strings

Tags:

java

utf-16

I have string that looks like this a👏b🙂c and I want to split it to single chars/strings.

static List<String> split(String text ) {
    List<String> list = new ArrayList<>(text.length());
    for(int i = 0; i < text.length() ; i++) {
        list.add(text.substring(i, i + 1));
    }
    return list;
}

public static void main(String... args) {
    split("a\uD83D\uDC4Fb\uD83D\uDE42c")
            .forEach(System.out::println);
}

As you might already notice instead of 👏 and 🙂 I'm getting two weird characters:

a
?
?
b
?
?
c
like image 307
MAGx2 Avatar asked Dec 14 '22 15:12

MAGx2


1 Answers

As per Character and String APIs docs you need to use code points to correctly handle the UTF multi-byte sequences.

"a👏b🙂c".codePoints().mapToObj(Character::toChars).forEach(System.out::println);

will output

a
👏
b
🙂
c
like image 78
Karol Dowbecki Avatar answered Dec 16 '22 06:12

Karol Dowbecki