Error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte

Question

I'm a beginner at Python, and I would like to read multiple csv file and when i encode them with encoding = "ISO-8859-1",I get this kind of characters in my csv file : "DÂ°faut". So I tried to encode in utf-8, I get this error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte'. Can someone help me please ? Thank you !

mk1x86 · Accepted Answer

If you decode with utf-8 you should also encode with utf-8. Depending on the unicode character you want to display (basically everything except for basic latin letters, digits and the usual symbols) utf-8 needs multiple bytes to store it. Since the file is read byte by byte you need to know if the next character needs more than a byte. This is indicated by the most significant bit of the byte. 0xb0 translates to 1011 0000 in binary and as you can see, the first bit is a 1 and that tells the utf-8 decoder that it needs more bytes for the character to be read. Since you encoded with iso-8859-1 the following byte will be part of the current character and encoding fails. If you want to encode the degree symbol (°), it would be encoded as 0xC2 0xB0.

In any case: Always encode with the same encoding as you want to decode. If you need characters outside the code page, use utf-8. In general using any of the utf encodings is a good advice.

Error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte

Tags:

python

python-3.x

encode

character-encoding

utf-8

Lina S

1 Answers

mk1x86

Recent Activity

Donate For Us

Error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte

Tags:

python

python-3.x

encode

character-encoding

utf-8

Lina S

1 Answers

mk1x86

Related questions

Recent Activity

Donate For Us