Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python write (iPhone) Emoji to a file

I have been trying to write a simple script that can save user input (originating from an iPhone) to a text file. The issue I'm having is that when a user uses an Emoji icon, it breaks the whole thing.

OS: Ubuntu

Python Version: 2.7.3

My code currently looks like this

f = codecs.open(path, "w+", encoding="utf8")
f.write("Desc: " + json_obj["description"])
f.close()

When an Emoji character is passed in the description variable, I get the error:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)

Any possible help is appreciated.

like image 722
wtf_are_my_initials Avatar asked Jul 08 '13 18:07

wtf_are_my_initials


People also ask

How do you encode Emojis in Python?

Every emoji has a unique Unicode assigned to it. When using Unicode with Python, replace "+" with "000" from the Unicode. And then prefix the Unicode with "\". For example- U+1F605 will be used as \U0001F605.

How do I download an emoji module in Python?

Emoji Module: Installation First, we have to open the command prompt terminal shell in the system, and then we have to use the following pip command to install Python Emoji Module through a pip installer: pip install emoji.

How do you change emoticons to Emojis in Python?

Press Ctrl, Command, and Space key on the keyboard to bring out the emoji box on a Mac System. For Windows, press and hold the Window button and either the Fullstop (.) or semicolon (;) until the emoji picker appears. Finally, loop through the string using the . get() method.


1 Answers

The most likely problem here is that json_obj["description"] is actually a UTF-8-encoded str, not a unicode. So, when you try to write it to a codecs-wrapped file, Python has to decode it from str to unicode so it can re-encode it. And that's the part that fails, because that automatic decoding uses sys.getdefaultencoding(), which is 'ascii'.

For example:

>>> f = codecs.open('emoji.txt', 'w+', encoding='utf-8')
>>> e = u'\U0001f1ef'
>>> print e
🇯
>>> e
u'\U0001f1ef'
>>> f.write(e)
>>> e8 = e.encode('utf-8')
>>> e8
'\xf0\x9f\x87\xaf'
>>> f.write(e8)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

There are two possible solutions here.

First, you can explicitly decode everything to unicode as early as possible. I'm not sure where your json_obj is coming from, but I suspect it's not actually the stdlib json.loads, because by default, that always gives you unicode keys and values. So, replacing whatever you're using for JSON with the stdlib functions will probably solve the problem.

Second, you can leave everything as UTF-8 str objects and stay in binary mode. If you know you have UTF-8 everywhere, just open the file instead of codecs.open, and write without any encoding.


Also, you should strongly consider using io.open instead of codecs.open. It has a number of advantages, including:

  • Raises an exception instead of doing the wrong thing if you pass it incorrect values.
  • Often faster.
  • Forward-compatible with Python 3.
  • Has a number of bug fixes that will never be back-ported to codecs.

The only disadvantage is that it's not backward compatible to Python 2.5. Unless that matters to you, don't use codecs.

like image 141
abarnert Avatar answered Sep 22 '22 08:09

abarnert