Python - 'ascii' codec can't decode byte

People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What is Unicode decode error in Python?

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.

What is Python import codec?

The codecs module defines a set of base classes which define the interface and can also be used to easily write your own codecs for use in Python. Each codec has to define four interfaces to make it usable as codec in Python: stateless encoder, stateless decoder, stream reader and stream writer.

"你好".encode('utf-8')

encode converts a unicode object to a string object. But here you have invoked it on a string object (because you don't have the u). So python has to convert the string to a unicode object first. So it does the equivalent of

"你好".decode().encode('utf-8')

But the decode fails because the string isn't valid ascii. That's why you get a complaint about not being able to decode.

Always encode from unicode to bytes.
In this direction, you get to choose the encoding.

>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print _
你好

The other way is to decode from bytes to unicode.
In this direction, you have to know what the encoding is.

>>> bytes = '\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print bytes
你好
>>> bytes.decode('utf-8')
u'\u4f60\u597d'
>>> print _
你好

This point can't be stressed enough. If you want to avoid playing unicode "whack-a-mole", it's important to understand what's happening at the data level. Here it is explained another way:

A unicode object is decoded already, you never want to call decode on it.
A bytestring object is encoded already, you never want to call encode on it.

Now, on seeing .encode on a byte string, Python 2 first tries to implicitly convert it to text (a unicode object). Similarly, on seeing .decode on a unicode string, Python 2 implicitly tries to convert it to bytes (a str object).

These implicit conversions are why you can get UnicodeDecodeError when you've called encode. It's because encoding usually accepts a parameter of type unicode; when receiving a str parameter, there's an implicit decoding into an object of type unicode before re-encoding it with another encoding. This conversion chooses a default 'ascii' decoder^†, giving you the decoding error inside an encoder.

In fact, in Python 3 the methods str.decode and bytes.encode don't even exist. Their removal was a [controversial] attempt to avoid this common confusion.

^†_{...or whatever coding sys.getdefaultencoding() mentions; usually this is 'ascii'}

You can try this

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

You can also try following

Add following line at top of your .py file.

# -*- coding: utf-8 -*-

If you're using Python < 3, you'll need to tell the interpreter that your string literal is Unicode by prefixing it with a u:

Python 2.7.2 (default, Jan 14 2012, 23:14:09) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "你好".encode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'

Further reading: Unicode HOWTO.

You use u"你好".encode('utf8') to encode an unicode string. But if you want to represent "你好", you should decode it. Just like:

"你好".decode("utf8")

You will get what you want. Maybe you should learn more about encode & decode.

In case you're dealing with Unicode, sometimes instead of encode('utf-8'), you can also try to ignore the special characters, e.g.

"你好".encode('ascii','ignore')

or as something.decode('unicode_escape').encode('ascii','ignore') as suggested here.

Not particularly useful in this example, but can work better in other scenarios when it's not possible to convert some special characters.

Alternatively you can consider replacing particular character using replace().

Related questions
                            
                                Django filter queryset __in for *every* item in list
                            
                                flask-sqlalchemy or sqlalchemy
                            
                                ImportError: DLL load failed: %1 is not a valid Win32 application. But the DLL's are there
                            
                                Play audio with Python
                            
                                correct way to use super (argument passing)
                            
                                Django using get_user_model vs settings.AUTH_USER_MODEL
                            
                                Subclass in type hinting
                            
                                How to use multiprocessing queue in Python?
                            
                                How to write header row with csv.DictWriter?
                            
                                Getting "Permission Denied" when running pip as root on my Mac
                            
                                pypi UserWarning: Unknown distribution option: 'install_requires'
                            
                                linux tee is not working with python?
                            
                                How to save a new sheet in an existing excel file, using Pandas?
                            
                                What algorithm does python's sorted() use? [duplicate]
                            
                                raw_input function in Python
                            
                                How to make two plots side-by-side using Python?
                            
                                Filtering a list of strings based on contents
                            
                                ImportError in importing from sklearn: cannot import name check_build
                            
                                binning data in python with scipy/numpy
                            
                                Strip spaces/tabs/newlines - python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - 'ascii' codec can't decode byte

Tags:

python

unicode

python-2.x

python-unicode

python-2.7

People also ask

Recent Activity

Donate For Us