Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ascii codec cant decode byte 0xe9

I have done some research and seen solutions but none have worked for me.

Python - 'ascii' codec can't decode byte

This didn't work for me. And I know the 0xe9 is the é character. But I still can't figure out how to get this working, here is my code

output_lines = ['<menu>', '<day name="monday">', '<meal name="BREAKFAST">', '<counter name="Entreé">', '<dish>', '<name icon1="Vegan" icon2="Mindful Item">', 'Cream of Wheat (Farina)','</name>', '</dish>', '</counter >', '</meal >', '</day >', '</menu >']
output_string = '\n'.join([line.encode("utf-8") for line in output_lines])

And this give me the error ascii codec cant decode byte 0xe9

And I have tried decoding, I have tried to replace the "é" but can't seem to get that to work either.

like image 963
iqueqiorio Avatar asked Mar 09 '15 16:03

iqueqiorio


People also ask

Is UTF-8 and ASCII same?

For characters represented by the 7-bit ASCII character codes, the UTF-8 representation is exactly equivalent to ASCII, allowing transparent round trip migration. Other Unicode characters are represented in UTF-8 by sequences of up to 6 bytes, though most Western European characters require only 2 bytes3.

What is Unicode decode error in Python?

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.

What character is 0xe9?

ASCII/Binary of 0xe9: é This page shows all the information about 0xe9, with is the character 'é' including the HTML code, the key combination and the hexadecimal, octal and birary encoding of the value.


1 Answers

You are trying to encode bytestrings:

>>> '<counter name="Entreé">'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)

Python is trying to be helpful, you can only encode a Unicode string to bytes, so to encode Python first implictly decodes, using the default encoding.

The solution is to not encode data that is already encoded, or first decode using a suitable codec before trying to encode again, if the data was encoded to a different codec than what you needed.

If you have a mix of unicode and bytestring values, decode just the bytestrings or encode just the unicode values; try to avoid mixing the types. The following decodes byte strings to unicode first:

def ensure_unicode(v):
    if isinstance(v, str):
        v = v.decode('utf8')
    return unicode(v)  # convert anything not a string to unicode too

output_string = u'\n'.join([ensure_unicode(line) for line in output_lines])
like image 92
Martijn Pieters Avatar answered Sep 28 '22 06:09

Martijn Pieters