I'm attempting to split a string into single words/chars, but I'm having trouble when it comes to emoji.
First of all, I can't simply split the string using an empty character because emojis are generally have length >= 2.
"😎".split("")
["�", "�"]
I found an emoji regex that mostly works, but now I am seeing some strange flesh-colored blocks. I even see them show up on twitter in some cases.
Here's a pen that illustrates the problem with the fleshy blocks http://codepen.io/positlabs/pen/QyEOEG?editors=011
UPDATE -----------
Trying out spliddit, and I'm still seeing the issue with the skin tone characters. Is there some way to glue them back together?
http://codepen.io/positlabs/pen/rxLqwL?editors=001
JavaScript's strings are UTF-16, so your emoji is internally represented as two code units:
> "\ud83d\ude0e" === "😎"
true
The String.prototype.split
function doesn't really care about surrogate pairs in UTF-16, so it naively reverses the individual code units and breaks your emoji, because JavaScript doesn't provide any way to deal with individual characters in strings.
There's no easy way to deal with it. You need a library like spliddit to handle the individual code units properly.
I'm not 100% familiar with the terminology, so please edit my answer as needed.
spliddit can't currently correctly split for example this Hindi text into its 5 characters: "अनुच्छेद"
You need the grapheme-splitter library: https://github.com/orling/grapheme-splitter It is a full implementation of the UAX-29 Unicode standard and will split even the most exotic letters, emoji being just one of many use cases
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With