I'm looking to count the number of perceived emoji characters in a provided Java string. I'm currently using the emoji4j library, but it doesn't work for grapheme clusters like this one: 👩👩👦👦
Calling EmojiUtil.getLength("👩👩👦👦")
returns 4
instead of 1
, and similarly calling EmojiUtil.getLength("👻👩👩👦👦")
returns 5
instead of 2
.
Are there any APIs or methods on String
in Java that make it easy to count grapheme clusters?
I've been hunting around but understandably the codePoints()
method on a String
includes not only the visible emojis, but also the zero width joiners.
I also attempted this using the BreakIterator
:
public static int getLength(String emoji) {
BreakIterator it = BreakIterator.getCharacterInstance();
it.setText(emoji);
int emojiCount = 0;
while (it.next() != BreakIterator.DONE) {
emojiCount++;
}
return emojiCount;
}
But it seems to behave identically to the codePoints()
method, returning 8
for something like "👻👩👩👦👦"
.
Emojis clearly are not phonemes because they don't represent sounds. They could be considered graphemes because they are their own smallest typographic unit, and in the same way they can be considered morphemes because they are their own smallest grammatical units.
emoji-java is a lightweight java library that helps you use Emojis in your java applications.
Grapheme, or more fully, a grapheme cluster string is a single user-visible character, which in turn may be several characters (codepoints) long. For example … a “ȫ” is a single grapheme but one, two, or even three characters, depending on normalization.
I ended up using the ICU library, which worked much better. No changes (aside from import statements) were needed from my original codeblock, as it simply provides a different implementation of BreakIterator
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With