I'm simply trying to decode \uXXXX\uXXXX\uXXXX-like string. But I get an error: <pre class="prettyprint"><code>$ python Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128) </code></pre> I'm Python newbie. What's a problem? Thanks!

Python is trying to be helpful. You cannot decode Unicode data, it is already decoded. So Python first will encode the data (using the ASCII codec) to get bytes to decode. It is this implicit encoding that fails. If you have Unicode data, it only makes sense to encode to UTF-8, not decode: <pre class="prettyprint"><code>>>> print u'\u041e\u043b\u044c\u0433\u0430' Ольга >>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8') '\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0' </code></pre> If you wanted a Unicode value, then using a Unicode literal (<code>u'...'</code>) is all you needed to do. No further decoding is necessary. The same implicit conversion takes place in the other direction; if you tried to encode a bytestring you'd trigger an implicit decoding: <pre class="prettyprint"><code>>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) </code></pre>

you can set default encoding utf-8. <pre class="prettyprint"><code>import sys reload(sys) sys.setdefaultencoding('utf-8') </code></pre>

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128) [duplicate]

Tags:

python

utf-8

decode

python-2.7

I'm simply trying to decode \uXXXX\uXXXX\uXXXX-like string. But I get an error:

$ python
Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

I'm Python newbie. What's a problem? Thanks!

891

asked Feb 16 '15 15:02

Serhii Matrunchyk

2 Answers

Python is trying to be helpful. You cannot decode Unicode data, it is already decoded. So Python first will encode the data (using the ASCII codec) to get bytes to decode. It is this implicit encoding that fails.

If you have Unicode data, it only makes sense to encode to UTF-8, not decode:

>>> print u'\u041e\u043b\u044c\u0433\u0430'
Ольга
>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8')
'\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'

If you wanted a Unicode value, then using a Unicode literal (u'...') is all you needed to do. No further decoding is necessary.

The same implicit conversion takes place in the other direction; if you tried to encode a bytestring you'd trigger an implicit decoding:

>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

149

answered Sep 19 '22 07:09

Martijn Pieters

you can set default encoding utf-8.

import sys  
reload(sys)  
sys.setdefaultencoding('utf-8')

answered Sep 18 '22 07:09

Ranvijay Sachan

Related questions
                            
                                Randomly capitalize letters in string [duplicate]
                            
                                How to set parameters in keras to be non-trainable?
                            
                                timeit ValueError: stmt is neither a string nor callable
                            
                                Find extreme outer points in image with Python OpenCV
                            
                                Is there a way to output the numbers only from a python list?
                            
                                Twisted(asynch server) vs Django(or any other framework)
                            
                                Replace strings in files by Python
                            
                                Finding maximum of a list of lists by sum of elements in Python
                            
                                Python download without supplying a filename
                            
                                When should I use escape and safe in Django's template system?
                            
                                Python - Add Date Stamp To Text File
                            
                                LLVM, Parrot, JVM, PyPy + python
                            
                                No connection could be made because the target machine actively refused it (Django)
                            
                                How to fix syntax error when printing a string with an apostrophe in it? [closed]
                            
                                Django: python manage.py runserver gives RuntimeError: maximum recursion depth exceeded in cmp
                            
                                Obtaining data from PubMed using python
                            
                                Pythonic list comprehension possible with this loop?
                            
                                Convex hull area in Python?
                            
                                How to Check list containing NaN
                            
                                Pillow installation error: command 'gcc' failed with exit status 1

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With