Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decoding emojis from tweets in Python 3

I have a simple python script that gets the text of a tweet.

However, emojis are somehow encoded, so they look like this in the output \xf0\x9f\x90\xa3.

Is there a way to find out what emoji this is from this output?

like image 297
EyfI Avatar asked May 17 '26 22:05

EyfI


1 Answers

Odds are it's UTF-8 encoded (along with the rest of the data, it's just that ASCII text happens to be be rendered identically in ASCII and UTF-8).

If you have a bytes like b'\xf0\x9f\x90\xa3', you'd just do:

b = b'\xf0\x9f\x90\xa3'
txt = b.decode('utf-8')

If you received it as a str, this is probably a mistaken decoding as latin-1 or some other code page, so just undo it and redo with UTF-8:

b = '\xf0\x9f\x90\xa3'
txt = b.encode('latin-1').decode('utf-8')
# If it's not latin-1, could be sys.getdefaultencoding()

Which gets an ordinal of 0x1f423 (my computer can't display it, or I'd have added it here), which is in the correct range for most of the emoji. As noted in the comments, unicodedata reports the character as a HATCHING CHICK.

like image 195
ShadowRanger Avatar answered May 19 '26 12:05

ShadowRanger