I tried to use dataset=pandas.read_csv('filename')
to make a framework. But somehow I can't do it because one of the column headers is written in Hebrew.
I checked, and it is possible for a DataFrame to have a Hebrew word as column header.
dataset.columns = ['שלום', 'b','c','d','e']
but I want to import the data itself from the csv containing the Hebrew word, which I can't.
I get this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 0: invalid start byte
.
How can I import a dataset to datadrame with the column header?
I used:
dataset = pd.read_csv('file_name.csv', encoding = "ISO-8859-8")
see https://docs.python.org/3/library/codecs.html#standard-encodings for encodings
Your file is not in utf-8
encoding.
Most likely in ASCII with Hebrew codepage
.
0xf9
in Hebrew codepage matches the first (last) character you show in your header example.
You'll have to use the encoding:
parameter with the correct codepage.
As for how to check your encoding, there's a simple trick here, might be of use:
You can just open the file using notepad and then goto File -> Save As. Next to the Save button there will be an encoding drop down and the file's current encoding will be selected there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With