Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count character length of emoji?

I want to validate the name when a new user signs up at my page. One of those checks is if the character limit isn't above 100.

But since one single emoji like 👩‍❤️‍💋‍👩 (those are actually 4 emoji together? see screenshot) count much more than 1 character I have issues to validate the name. I want to allow emoji in the name, because these days it's quite common to have a heart, star or something similar there, but I don't want to allow names with more than 100 characters.

So I have this question:

  • How to count one emoji as one character of all emojis out there (if it's even possible)?

PS: I'm talking about a php solution, but I would alternatively accept Javascript too, even if I don't prefer it.

Edit: My example emoji seems to be this string: \ud83d\udc69\u200d\u2764\ufe0f\u200d\ud83d\udc8b\u200d\ud83d\udc69

Please notice the mentioned screenshot of this question:

The screenshot of this question, please notice the emoji output.

like image 973
AlexioVay Avatar asked Nov 01 '16 14:11

AlexioVay


People also ask

How many characters do emojis count as?

All emoji regardless of gender, race, eggplant, or flag will count as a total of two characters.

Is emoji a character?

Emoji are “picture characters” originally associated with cellular telephone usage in Japan, but now popular worldwide. The word emoji comes from the Japanese 絵 (e ≅ picture) + 文字 (moji ≅ written character).

Are all emojis 4 bytes?

> Most of the emoji are 3-byte Unicode characters. The most recent Emoji standard has 1,182 characters classified as Emoji and 179 of them are in the BMP [1]. Others are encoded as 4 bytes in any UTF encodings.

How many characters does an emoji use twitter?

Emojis. Emoji supported by twemoji always count as two characters, regardless of combining modifiers.


Video Answer


2 Answers

As a potential javascript solution (if you don't mind adding a library), Lodash has tackled this problem in their toArray module.

For example,

_.toArray('12👪').length; // --> 3

Or, if you want to knock a few arbitrary characters off a string, you manipulate and rejoin the array, like:

_.toArray("👪trimToEightGlyphs").splice(0,8).join(''); // --> '👪trimToE'
like image 51
Evan Rusackas Avatar answered Sep 30 '22 11:09

Evan Rusackas


Unicode defines abstract characters as code points, but what allows for rendering it on screen is the font. A font is a collection of graphical shapes, called glyphs, and they are the visual representation of a code point or a sequence of code points. A sequence of one or more code points that are displayed as a single graphical unit is called grapheme.

If you need to get the length in grapheme units (and NOT characters, like mb_strlen would do), you can use grapheme_strlen:

$emoji = "\u{1F469}\u{200D}\u{2764}\u{FE0F}\u{200D}\u{1F48B}\u{200D}\u{1F469}";
echo $emoji , " : " , strlen($emoji) , "\n"; // 27, count bytes
echo $emoji , " : " , mb_strlen($emoji) , "\n"; // 8, count characters
echo $emoji , " : " , grapheme_strlen($emoji) , "\n"; // 1, count grapheme units

https://3v4l.org/KSSl4

like image 32
Federkun Avatar answered Sep 30 '22 11:09

Federkun