<p>I found this awesome way to detect emojis using a regex that doesn't use "huge magic ranges" by using a Unicode property escape:</p> <p></p> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>console.log(/\p{Emoji}/u.test('flowers 🌼🌺🌸')) // true console.log(/\p{Emoji}/u.test('flowers')) // false</code></pre> </div> </div> <p>But when I shared this knowledge in this answer, @Bronzdragon noticed that <code>\p{Emoji}</code> also matches numbers! Why is that? Numbers are not emojis?</p> <p></p> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>console.log(/\p{Emoji}/u.test('flowers 123')) // unexpectdly true // regex-only workaround by @Bonzdragon const regex = /(?=\p{Emoji})(?!\p{Number})/u; console.log( regex.test('flowers'), // false, as expected regex.test('flowers 123'), // false, as expected regex.test('flowers 123 🌼🌺🌸'), // true, as expected regex.test('flowers 🌼🌺🌸'), // true, as expected ) // more readable workaround const hasEmoji = str => { const nbEmojiOrNumber = (str.match(/\p{Emoji}/gu) || []).length; const nbNumber = (str.match(/\p{Number}/gu) || []).length; return nbEmojiOrNumber > nbNumber; } console.log( hasEmoji('flowers'), // false, as expected hasEmoji('flowers 123'), // false, as expected hasEmoji('flowers 123 🌼🌺🌸'), // true, as expected hasEmoji('flowers 🌼🌺🌸'), // true, as expected )</code></pre> </div> </div>

<p>According to this post, digtis, <code>#</code>, <code>*</code>, ZWJ and some more chars contain the <code>Emoji</code> property set to <em>Yes</em>, which means <strong>digits are considered valid emoji chars</strong>:</p> <pre class="prettyprint"><code>0023 ; Emoji_Component # 1.1 [1] (#️) number sign 002A ; Emoji_Component # 1.1 [1] (*️) asterisk 0030..0039 ; Emoji_Component # 1.1 [10] (0️..9️) digit zero..digit nine 200D ; Emoji_Component # 1.1 [1] (&zwj;) zero width joiner 20E3 ; Emoji_Component # 3.0 [1] (⃣) combining enclosing keycap FE0F ; Emoji_Component # 3.2 [1] () VARIATION SELECTOR-16 1F1E6..1F1FF ; Emoji_Component # 6.0 [26] (🇦..🇿) regional indicator symbol letter a..regional indicator symbol letter z 1F3FB..1F3FF ; Emoji_Component # 8.0 [5] (🏻..🏿) light skin tone..dark skin tone 1F9B0..1F9B3 ; Emoji_Component # 11.0 [4] (🦰..🦳) red-haired..white-haired E0020..E007F ; Emoji_Component # 3.1 [96] (󠀠..󠁿) tag space..cancel tag </code></pre> <p>For example, <code>1</code> is a digit, but it becomes an emoji when combined with <code>U+FE0F</code> and <code>U+20E3</code> chars: 1️⃣:</p> <p></p> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>console.log("1\uFE0F\u20E3 2\uFE0F\u20E3 3\uFE0F\u20E3 4\uFE0F\u20E3 5\uFE0F\u20E3 6\uFE0F\u20E3 7\uFE0F\u20E3 8\uFE0F\u20E3 9\uFE0F\u20E3 0\uFE0F\u20E3")</code></pre> </div> </div> <p>If you want to avoid matching digits, use <code>Extended_Pictographic</code> Unicode category class:</p> <blockquote> <p>The Extended_Pictographic characters contain all the Emoji characters except for some Emoji_Components.</p> </blockquote> <p>So, you may use either <code>/\p{Extended_Pictographic}/gu</code> to most emojis proper, or <code>/\p{Extended_Pictographic}/u</code> to test for a single emoji proper, or use <code>/[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u</code> to match emojis proper and light skin to dark skin mode chars and red-haired to white-haired chars:</p> <p></p> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>const regex_emoji = /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u; console.log( regex_emoji.test('flowers 123') ); // => false console.log( regex_emoji.test('flowers 🌼🌺🌸') ); // => true</code></pre> </div> </div>

Why do Unicode emoji property escapes match numbers?

Tags:

javascript

regex

emoji

I found this awesome way to detect emojis using a regex that doesn't use "huge magic ranges" by using a Unicode property escape:

console.log(/\p{Emoji}/u.test('flowers 🌼🌺🌸')) // true
console.log(/\p{Emoji}/u.test('flowers')) // false

But when I shared this knowledge in this answer, @Bronzdragon noticed that \p{Emoji} also matches numbers! Why is that? Numbers are not emojis?

console.log(/\p{Emoji}/u.test('flowers 123')) // unexpectdly true

// regex-only workaround by @Bonzdragon
const regex = /(?=\p{Emoji})(?!\p{Number})/u;
console.log(
  regex.test('flowers'), // false, as expected
  regex.test('flowers 123'), // false, as expected
  regex.test('flowers 123 🌼🌺🌸'), // true, as expected
  regex.test('flowers 🌼🌺🌸'), // true, as expected
)

// more readable workaround
const hasEmoji = str => {
  const nbEmojiOrNumber = (str.match(/\p{Emoji}/gu) || []).length;
  const nbNumber = (str.match(/\p{Number}/gu) || []).length;
  return nbEmojiOrNumber > nbNumber;
}
console.log(
  hasEmoji('flowers'), // false, as expected
  hasEmoji('flowers 123'), // false, as expected
  hasEmoji('flowers 123 🌼🌺🌸'), // true, as expected
  hasEmoji('flowers 🌼🌺🌸'), // true, as expected
)

384

asked Oct 16 '20 12:10

Nino Filiu

1 Answers

According to this post, digtis, #, *, ZWJ and some more chars contain the Emoji property set to Yes, which means digits are considered valid emoji chars:

0023          ; Emoji_Component      #  1.1  [1] (#️)       number sign
002A          ; Emoji_Component      #  1.1  [1] (*️)       asterisk
0030..0039    ; Emoji_Component      #  1.1 [10] (0️..9️)    digit zero..digit nine
200D          ; Emoji_Component      #  1.1  [1] (‍)        zero width joiner
20E3          ; Emoji_Component      #  3.0  [1] (⃣)       combining enclosing keycap
FE0F          ; Emoji_Component      #  3.2  [1] ()        VARIATION SELECTOR-16
1F1E6..1F1FF  ; Emoji_Component      #  6.0 [26] (🇦..🇿)    regional indicator symbol letter a..regional indicator symbol letter z
1F3FB..1F3FF  ; Emoji_Component      #  8.0  [5] (🏻..🏿)    light skin tone..dark skin tone
1F9B0..1F9B3  ; Emoji_Component      # 11.0  [4] (🦰..🦳)    red-haired..white-haired
E0020..E007F  ; Emoji_Component      #  3.1 [96] (󠀠..󠁿)      tag space..cancel tag

For example, 1 is a digit, but it becomes an emoji when combined with U+FE0F and U+20E3 chars: 1️⃣:

console.log("1\uFE0F\u20E3 2\uFE0F\u20E3 3\uFE0F\u20E3 4\uFE0F\u20E3 5\uFE0F\u20E3 6\uFE0F\u20E3 7\uFE0F\u20E3 8\uFE0F\u20E3 9\uFE0F\u20E3 0\uFE0F\u20E3")

If you want to avoid matching digits, use Extended_Pictographic Unicode category class:

The Extended_Pictographic characters contain all the Emoji characters except for some Emoji_Components.

So, you may use either /\p{Extended_Pictographic}/gu to most emojis proper, or /\p{Extended_Pictographic}/u to test for a single emoji proper, or use /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u to match emojis proper and light skin to dark skin mode chars and red-haired to white-haired chars:

const regex_emoji = /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u;
console.log( regex_emoji.test('flowers 123') );     // => false
console.log( regex_emoji.test('flowers 🌼🌺🌸') ); // => true

answered Oct 22 '22 17:10

Wiktor Stribiżew

Related questions
                            
                                node - using jest with esm package
                            
                                Combination without repetition javascript [duplicate]
                            
                                React/nextJS: How to debug different nodes of SSR react application?
                            
                                Data not refreshing after login to homepage in reactjs
                            
                                How the function is used without parenthesis in addEventListener?
                            
                                Cannot destructure property of null
                            
                                MaxListenersExceededWarning: Possible EventEmitter memory leak dete
                            
                                How to deal with React Native animated.timing in same child components
                            
                                Ajax GET Request is Sent Twice
                            
                                useEffect with debounce
                            
                                How to have a Loading Spinner Animation on top of a Bootstrap Modal
                            
                                How to generate JSDoc for `pipe`d ES6 function
                            
                                How to make a texture always face the camera ..?
                            
                                How to fix Nodemon "async remove {}" syntax error?
                            
                                SAP B1, How to display fetched Image from ItemImage?
                            
                                How to make eslint resolve paths mapped in jsconfig
                            
                                Use ESLint plugin only for some files/directories
                            
                                Vue Router - Change anchor in route on scroll
                            
                                What is the difference between lit-element & lit-html?
                            
                                VSCode add .js extension on import autocomplete

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With