Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a specific range of unicode code points which can be checked for emojis?

Do emojis occupy a well-defined unicode range?

And, is there a definitive way to check whether a code point is an emoji in python 2.7?

I cannot seem to find any information on this. A couple of sources have pointed to the range:

\U0001f600-\U0001f650

But for example, 🤘 has the code point

\U0001f918

which lies outside this range.

Thanks.

like image 268
Eric Conner Avatar asked Aug 02 '16 21:08

Eric Conner


People also ask

What is the Unicode range for emojis?

The range of Unicode code points goes from U+0000 to U+10FFFF .

Can emojis be represented by Unicode?

No. Because emoji characters are treated as pictographs, they are encoded in Unicode based primarily on their general appearance, not on an intended semantic.

Does UTF 8 contain emojis?

Emojis look like images, or icons, but they are not. They are letters (characters) from the UTF-8 (Unicode) character set.

What encoding is used for emojis?

The Unicode Standard has assigned numbers to represent emojis. Here's how it works. In the Unicode Standard, each emoji is represented as a "code point" (a hexadecimal number) that looks like U+1F063, for example.


1 Answers

regex supports matching by Unicode property, but unfortunately it does not (yet?) support the emoji-specific properties. When it does, finding them will be as simple as:

>>> regex.match(ur'\P{Emoji=yes}', u'🤘') # NOTE: Doesn't (yet) work

In the meantime, here's the emoji table from unicode.org.

like image 142
Ignacio Vazquez-Abrams Avatar answered Nov 03 '22 11:11

Ignacio Vazquez-Abrams