Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte

I'm a beginner at Python, and I would like to read multiple csv file and when i encode them with encoding = "ISO-8859-1",I get this kind of characters in my csv file : "D°faut". So I tried to encode in utf-8, I get this error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte'. Can someone help me please ? Thank you !

like image 835
Lina S Avatar asked Sep 05 '25 00:09

Lina S


1 Answers

If you decode with utf-8 you should also encode with utf-8. Depending on the unicode character you want to display (basically everything except for basic latin letters, digits and the usual symbols) utf-8 needs multiple bytes to store it. Since the file is read byte by byte you need to know if the next character needs more than a byte. This is indicated by the most significant bit of the byte. 0xb0 translates to 1011 0000 in binary and as you can see, the first bit is a 1 and that tells the utf-8 decoder that it needs more bytes for the character to be read. Since you encoded with iso-8859-1 the following byte will be part of the current character and encoding fails. If you want to encode the degree symbol (°), it would be encoded as 0xC2 0xB0.

In any case: Always encode with the same encoding as you want to decode. If you need characters outside the code page, use utf-8. In general using any of the utf encodings is a good advice.

like image 161
mk1x86 Avatar answered Sep 07 '25 19:09

mk1x86