How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python

Question

I scrawled down the data and had to save the dataframe as utf-16 (Unicode) since the Latin/Spanish words were shown weird in the form of utf-8. I used the following code to save the dataframe:

 df.to_csv("blogdata.csv", encoding = "utf-16", sep = "	", index = False)

when I try to read the file to clean the data using the following code:

 blogdata = pd.read_csv('c:/Users/hyoungm?Downloads/blogdata.csv')

it shows the following error.

UnicodeDecodeError Traceback (most recent call last) in () ----> 1 blogdata = pd.read_csv('C:/Users/hyoungm/Downloads/blogdata.csv')

...

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.cinit()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._get_header()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Please see my screenshot here: enter image description here

I don't know either how to save the original data without losing those Laint/Spanish words within English sentences or how to read Unicode data file. Can anybody please help me with solving this issue?

Thank you very much!

Helen Batson · Accepted Answer

There is a Python library which may help when the encoding is unknown: chardet

with open(filename, 'rb') as file:
    print(chardet.detect(file.read()))

detect finds the encoding, and 'rb' will read the file in as binary

How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python

Tags:

python

Hyoungeun Moon

1 Answers

Helen Batson

Recent Activity

Donate For Us

How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python

Tags:

python

Hyoungeun Moon

1 Answers

Helen Batson

Related questions

Recent Activity

Donate For Us