I'm a beginner at Python, and I would like to read multiple csv file and when i encode them with encoding = "ISO-8859-1"
,I get this kind of characters in my csv file : "D°faut". So I tried to encode in utf-8
, I get this error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte'.
Can someone help me please ?
Thank you !
If you decode with utf-8 you should also encode with utf-8. Depending on the unicode character you want to display (basically everything except for basic latin letters, digits and the usual symbols) utf-8 needs multiple bytes to store it. Since the file is read byte by byte you need to know if the next character needs more than a byte. This is indicated by the most significant bit of the byte. 0xb0 translates to 1011 0000 in binary and as you can see, the first bit is a 1 and that tells the utf-8 decoder that it needs more bytes for the character to be read. Since you encoded with iso-8859-1 the following byte will be part of the current character and encoding fails. If you want to encode the degree symbol (°), it would be encoded as 0xC2 0xB0.
In any case: Always encode with the same encoding as you want to decode. If you need characters outside the code page, use utf-8. In general using any of the utf encodings is a good advice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With