Do emojis occupy a well-defined unicode range?
And, is there a definitive way to check whether a code point is an emoji in python 2.7?
I cannot seem to find any information on this. A couple of sources have pointed to the range:
\U0001f600-\U0001f650
But for example, 🤘 has the code point
\U0001f918
which lies outside this range.
Thanks.
The range of Unicode code points goes from U+0000 to U+10FFFF .
No. Because emoji characters are treated as pictographs, they are encoded in Unicode based primarily on their general appearance, not on an intended semantic.
Emojis look like images, or icons, but they are not. They are letters (characters) from the UTF-8 (Unicode) character set.
The Unicode Standard has assigned numbers to represent emojis. Here's how it works. In the Unicode Standard, each emoji is represented as a "code point" (a hexadecimal number) that looks like U+1F063, for example.
regex supports matching by Unicode property, but unfortunately it does not (yet?) support the emoji-specific properties. When it does, finding them will be as simple as:
>>> regex.match(ur'\P{Emoji=yes}', u'🤘') # NOTE: Doesn't (yet) work
In the meantime, here's the emoji table from unicode.org.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With