Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript substring without splitting emoji

in my js I am trying to substring() text which generally works but unfortunately decapitates emojis.

usaText = "A🇺🇸Z"
splitText = usaText.substring(0,2) //"A�"
splitText = usaText.substring(0,3) //"A🇺"
splitText = usaText.substring(0,4) //"A🇺�"
splitText = usaText.substring(0,5) //"A🇺🇸"

Is there a way to use substring without breaking emoji? In my production code I cut at about 40 characters and I wouldn't mind if it was 35 or 45. I have thought about simply checking whether the 40th character is a number or between a-z but that wouldn't work if you got a text full of emojis. I could check whether the last character is one that "ends" an emoji by pattern matching but this also seems a bit weird performance-wise.

Am I missing something? With all the bloat that JavaScript carries, is there no built-in count that sees emoji as one?

To the Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") thing:

chrs = Array.from( usaText )
(4) ["A", "🇺", "🇸", "Z"]
0: "A"
1: "🇺"
2: "🇸"
3: "Z"
length: 4

That's one too many unfortunately.

like image 804
user2875404 Avatar asked Sep 26 '18 22:09

user2875404


3 Answers

This code has worked for me :

splitText = Array.from(usaText).slice(0, 5).join('');
like image 192
hs_dino Avatar answered Nov 16 '22 06:11

hs_dino


So this isn't really an easy thing to do, and I'm inclined to tell you that you shouldn't write this on your own. You should use a library like runes.

Just a simple npm i runes, then:

const runes = require('runes');
const usaText = "A🇺🇸Z";
runes.substr(usaText, 0, 2); // "A🇺🇸"
like image 39
MichaelSolati Avatar answered Nov 16 '22 08:11

MichaelSolati


Disclaimer: This is just extending the above comment by Mike 'Pomax' Kamermans because to me it is actually a much simpler, applicable answer (for those of us who don't like reading through all the comments):

Array.from(str) splits your string into individual unicode characters without breaking them between bytes.

See Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") for details.

like image 3
E. Villiger Avatar answered Nov 16 '22 07:11

E. Villiger