in my js I am trying to substring()
text which generally works but unfortunately decapitates emojis.
usaText = "A🇺🇸Z"
splitText = usaText.substring(0,2) //"A�"
splitText = usaText.substring(0,3) //"A🇺"
splitText = usaText.substring(0,4) //"A🇺�"
splitText = usaText.substring(0,5) //"A🇺🇸"
Is there a way to use substring without breaking emoji? In my production code I cut at about 40 characters and I wouldn't mind if it was 35 or 45. I have thought about simply checking whether the 40th character is a number or between a-z but that wouldn't work if you got a text full of emojis. I could check whether the last character is one that "ends" an emoji by pattern matching but this also seems a bit weird performance-wise.
Am I missing something? With all the bloat that JavaScript carries, is there no built-in count
that sees emoji as one?
To the Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") thing:
chrs = Array.from( usaText )
(4) ["A", "🇺", "🇸", "Z"]
0: "A"
1: "🇺"
2: "🇸"
3: "Z"
length: 4
That's one too many unfortunately.
This code has worked for me :
splitText = Array.from(usaText).slice(0, 5).join('');
So this isn't really an easy thing to do, and I'm inclined to tell you that you shouldn't write this on your own. You should use a library like runes.
Just a simple npm i runes
, then:
const runes = require('runes');
const usaText = "A🇺🇸Z";
runes.substr(usaText, 0, 2); // "A🇺🇸"
Disclaimer: This is just extending the above comment by Mike 'Pomax' Kamermans because to me it is actually a much simpler, applicable answer (for those of us who don't like reading through all the comments):
Array.from(str) splits your string into individual unicode characters without breaking them between bytes.
See Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With