I found this awesome way to detect emojis using a regex that doesn't use "huge magic ranges" by using a Unicode property escape:
console.log(/\p{Emoji}/u.test('flowers 🌼🌺🌸')) // true
console.log(/\p{Emoji}/u.test('flowers')) // false
But when I shared this knowledge in this answer, @Bronzdragon noticed that \p{Emoji}
also matches numbers! Why is that? Numbers are not emojis?
console.log(/\p{Emoji}/u.test('flowers 123')) // unexpectdly true
// regex-only workaround by @Bonzdragon
const regex = /(?=\p{Emoji})(?!\p{Number})/u;
console.log(
regex.test('flowers'), // false, as expected
regex.test('flowers 123'), // false, as expected
regex.test('flowers 123 🌼🌺🌸'), // true, as expected
regex.test('flowers 🌼🌺🌸'), // true, as expected
)
// more readable workaround
const hasEmoji = str => {
const nbEmojiOrNumber = (str.match(/\p{Emoji}/gu) || []).length;
const nbNumber = (str.match(/\p{Number}/gu) || []).length;
return nbEmojiOrNumber > nbNumber;
}
console.log(
hasEmoji('flowers'), // false, as expected
hasEmoji('flowers 123'), // false, as expected
hasEmoji('flowers 123 🌼🌺🌸'), // true, as expected
hasEmoji('flowers 🌼🌺🌸'), // true, as expected
)
It is possible to use both short or long forms in Unicode property escapes. They can be used to match letters, numbers, symbols, punctuations, spaces, etc.
Because emoji characters are treated as pictographs, they are encoded in Unicode based primarily on their general appearance, not on an intended semantic. The meaning of each emoji can vary depending on language, culture, context, and may change or be repurposed by various groups over time.
emoji-regex offers a regular expression to match all emoji symbols and sequences (including textual representations of emoji) as per the Unicode Standard.
Emojis look like images, or icons, but they are not. They are letters (characters) from the UTF-8 (Unicode) character set.
According to this post, digtis, #
, *
, ZWJ and some more chars contain the Emoji
property set to Yes, which means digits are considered valid emoji chars:
0023 ; Emoji_Component # 1.1 [1] (#️) number sign
002A ; Emoji_Component # 1.1 [1] (*️) asterisk
0030..0039 ; Emoji_Component # 1.1 [10] (0️..9️) digit zero..digit nine
200D ; Emoji_Component # 1.1 [1] () zero width joiner
20E3 ; Emoji_Component # 3.0 [1] (⃣) combining enclosing keycap
FE0F ; Emoji_Component # 3.2 [1] () VARIATION SELECTOR-16
1F1E6..1F1FF ; Emoji_Component # 6.0 [26] (🇦..🇿) regional indicator symbol letter a..regional indicator symbol letter z
1F3FB..1F3FF ; Emoji_Component # 8.0 [5] (🏻..🏿) light skin tone..dark skin tone
1F9B0..1F9B3 ; Emoji_Component # 11.0 [4] (🦰..🦳) red-haired..white-haired
E0020..E007F ; Emoji_Component # 3.1 [96] (..) tag space..cancel tag
For example, 1
is a digit, but it becomes an emoji when combined with U+FE0F
and U+20E3
chars: 1️⃣:
console.log("1\uFE0F\u20E3 2\uFE0F\u20E3 3\uFE0F\u20E3 4\uFE0F\u20E3 5\uFE0F\u20E3 6\uFE0F\u20E3 7\uFE0F\u20E3 8\uFE0F\u20E3 9\uFE0F\u20E3 0\uFE0F\u20E3")
If you want to avoid matching digits, use Extended_Pictographic
Unicode category class:
The Extended_Pictographic characters contain all the Emoji characters except for some Emoji_Components.
So, you may use either /\p{Extended_Pictographic}/gu
to most emojis proper, or /\p{Extended_Pictographic}/u
to test for a single emoji proper, or use /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u
to match emojis proper and light skin to dark skin mode chars and red-haired to white-haired chars:
const regex_emoji = /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u;
console.log( regex_emoji.test('flowers 123') ); // => false
console.log( regex_emoji.test('flowers 🌼🌺🌸') ); // => true
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With