Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte

Tags:

python

pandas

csv

I am new to Python, I am trying to read csv file using below script.

Past=pd.read_csv("C:/Users/Admin/Desktop/Python/Past.csv",encoding='utf-8') 

But, getting error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte", Please help me to know issue here, I used encoding in script thought it will resolve error.

like image 510
user3734568 Avatar asked Aug 06 '17 07:08

user3734568


People also ask

What is UTF 8 codec can't decode byte?

The Python "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte" occurs when we specify an incorrect encoding when decoding a bytes object. To solve the error, specify the correct encoding, e.g. utf-16 or open the file in binary mode ( rb or wb ).

What is byte 0x96?

0x96 is in binary 10010110, and any byte matching the pattern 10XXXXXX (0x80 to 0xBF) can only be a second or subsequent byte in a UTF-8 encoding. Hence the stream is either not UTF-8 or else is corrupted.

What is Unicode decode error in Python?

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.


1 Answers

This happens because you chose the wrong encoding.

Since you are working on a Windows machine, just replacing

Past=pd.read_csv("C:/Users/.../Past.csv",encoding='utf-8')  

with

Past=pd.read_csv("C:/Users/.../Past.csv",encoding='cp1252') 

should solve the problem.

like image 114
Liam Avatar answered Sep 18 '22 10:09

Liam