I have been trying to write a simple script that can save user input (originating from an iPhone) to a text file. The issue I'm having is that when a user uses an Emoji icon, it breaks the whole thing.
OS: Ubuntu
Python Version: 2.7.3
My code currently looks like this
f = codecs.open(path, "w+", encoding="utf8")
f.write("Desc: " + json_obj["description"])
f.close()
When an Emoji character is passed in the description variable, I get the error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)
Any possible help is appreciated.
Every emoji has a unique Unicode assigned to it. When using Unicode with Python, replace "+" with "000" from the Unicode. And then prefix the Unicode with "\". For example- U+1F605 will be used as \U0001F605.
Emoji Module: Installation First, we have to open the command prompt terminal shell in the system, and then we have to use the following pip command to install Python Emoji Module through a pip installer: pip install emoji.
Press Ctrl, Command, and Space key on the keyboard to bring out the emoji box on a Mac System. For Windows, press and hold the Window button and either the Fullstop (.) or semicolon (;) until the emoji picker appears. Finally, loop through the string using the . get() method.
The most likely problem here is that json_obj["description"]
is actually a UTF-8-encoded str
, not a unicode
. So, when you try to write
it to a codecs
-wrapped file, Python has to decode it from str
to unicode
so it can re-encode it. And that's the part that fails, because that automatic decoding uses sys.getdefaultencoding()
, which is 'ascii'
.
For example:
>>> f = codecs.open('emoji.txt', 'w+', encoding='utf-8')
>>> e = u'\U0001f1ef'
>>> print e
🇯
>>> e
u'\U0001f1ef'
>>> f.write(e)
>>> e8 = e.encode('utf-8')
>>> e8
'\xf0\x9f\x87\xaf'
>>> f.write(e8)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
There are two possible solutions here.
First, you can explicitly decode everything to unicode
as early as possible. I'm not sure where your json_obj
is coming from, but I suspect it's not actually the stdlib json.loads
, because by default, that always gives you unicode
keys and values. So, replacing whatever you're using for JSON with the stdlib functions will probably solve the problem.
Second, you can leave everything as UTF-8 str
objects and stay in binary mode. If you know you have UTF-8 everywhere, just open
the file instead of codecs.open
, and write without any encoding.
Also, you should strongly consider using io.open
instead of codecs.open
. It has a number of advantages, including:
codecs
.The only disadvantage is that it's not backward compatible to Python 2.5. Unless that matters to you, don't use codecs
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With