I want to validate the name when a new user signs up at my page. One of those checks is if the character limit isn't above 100.
But since one single emoji like 👩❤️💋👩 (those are actually 4 emoji together? see screenshot) count much more than 1 character I have issues to validate the name. I want to allow emoji in the name, because these days it's quite common to have a heart, star or something similar there, but I don't want to allow names with more than 100 characters.
So I have this question:
PS: I'm talking about a php solution, but I would alternatively accept Javascript too, even if I don't prefer it.
Edit: My example emoji seems to be this string: \ud83d\udc69\u200d\u2764\ufe0f\u200d\ud83d\udc8b\u200d\ud83d\udc69
Please notice the mentioned screenshot of this question:
.
All emoji regardless of gender, race, eggplant, or flag will count as a total of two characters.
Emoji are “picture characters” originally associated with cellular telephone usage in Japan, but now popular worldwide. The word emoji comes from the Japanese 絵 (e ≅ picture) + 文字 (moji ≅ written character).
> Most of the emoji are 3-byte Unicode characters. The most recent Emoji standard has 1,182 characters classified as Emoji and 179 of them are in the BMP [1]. Others are encoded as 4 bytes in any UTF encodings.
Emojis. Emoji supported by twemoji always count as two characters, regardless of combining modifiers.
As a potential javascript solution (if you don't mind adding a library), Lodash has tackled this problem in their toArray module.
For example,
_.toArray('12👪').length; // --> 3
Or, if you want to knock a few arbitrary characters off a string, you manipulate and rejoin the array, like:
_.toArray("👪trimToEightGlyphs").splice(0,8).join(''); // --> '👪trimToE'
Unicode defines abstract characters as code points, but what allows for rendering it on screen is the font. A font is a collection of graphical shapes, called glyphs, and they are the visual representation of a code point or a sequence of code points. A sequence of one or more code points that are displayed as a single graphical unit is called grapheme.
If you need to get the length in grapheme units (and NOT characters, like mb_strlen
would do), you can use grapheme_strlen
:
$emoji = "\u{1F469}\u{200D}\u{2764}\u{FE0F}\u{200D}\u{1F48B}\u{200D}\u{1F469}";
echo $emoji , " : " , strlen($emoji) , "\n"; // 27, count bytes
echo $emoji , " : " , mb_strlen($emoji) , "\n"; // 8, count characters
echo $emoji , " : " , grapheme_strlen($emoji) , "\n"; // 1, count grapheme units
https://3v4l.org/KSSl4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With