Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Reading Emoji Unicode Characters

I have a Python 2.7 program which reads iOS text messages from a SQLite database. The text messages are unicode strings. In the following text message:

u'that\u2019s \U0001f63b'

The apostrophe is represented by \u2019, but the emoji is represented by \U0001f63b. I looked up the code point for the emoji in question, and it's \uf63b. I'm not sure where the 0001 is coming from. I know comically little about character encodings.

When I print the text, character by character, using:

s = u'that\u2019s \U0001f63b'

for c in s:
    print c.encode('unicode_escape')

The program produces the following output:

t
h
a
t
\u2019
s

\ud83d
\ude3b

How can I correctly read these last characters in Python? Am I using encode correctly here? Should I just attempt to trash those 0001s before reading it, or is there an easier, less silly way?

like image 822
Andrew LaPrise Avatar asked Jul 07 '15 22:07

Andrew LaPrise


People also ask

Can Python read emojis?

Emojis can also be implemented by using the emoji module provided in Python. To install it run the following in the terminal. emojize() function requires the CLDR short name to be passed in it as the parameter.

Does UTF 8 include emoji?

Emojis are also characters from the UTF-8 alphabet: πŸ˜„ is 128516.

How do you use Unicode characters in Python?

To print any character in the Python interpreter, use a \u to denote a unicode character and then follow with the character code.

Can emojis be represented in Unicode?

Because emoji characters are treated as pictographs, they are encoded in Unicode based primarily on their general appearance, not on an intended semantic. The meaning of each emoji can vary depending on language, culture, context, and may change or be repurposed by various groups over time.


1 Answers

I don't think you're using encode correctly, nor do you need to. What you have is a valid unicode string with one 4 digit and one 8 digit escape sequence. Try this in the REPL on, say, OS X

>>> s = u'that\u2019s \U0001f63b'
>>> print s
that’s 😻

In python3, though -

Python 3.4.3 (default, Jul  7 2015, 15:40:07) 
>>> s  = u'that\u2019s \U0001f63b'
>>> s[-1]
'😻'
like image 72
pvg Avatar answered Oct 31 '22 09:10

pvg