Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python read csv with Hebrew header

I tried to use dataset=pandas.read_csv('filename') to make a framework. But somehow I can't do it because one of the column headers is written in Hebrew.

I checked, and it is possible for a DataFrame to have a Hebrew word as column header. dataset.columns = ['שלום', 'b','c','d','e'] but I want to import the data itself from the csv containing the Hebrew word, which I can't.

I get this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 0: invalid start byte.

How can I import a dataset to datadrame with the column header?

like image 618
Matan Avatar asked Nov 20 '17 14:11

Matan


3 Answers

I used:

dataset = pd.read_csv('file_name.csv', encoding = "ISO-8859-8")

see https://docs.python.org/3/library/codecs.html#standard-encodings for encodings

like image 92
user1875037 Avatar answered Sep 20 '22 00:09

user1875037


Your file is not in utf-8 encoding.

Most likely in ASCII with Hebrew codepage.

0xf9 in Hebrew codepage matches the first (last) character you show in your header example.

You'll have to use the encoding: parameter with the correct codepage.

like image 32
Danny_ds Avatar answered Sep 19 '22 00:09

Danny_ds


As for how to check your encoding, there's a simple trick here, might be of use:

You can just open the file using notepad and then goto File -> Save As. Next to the Save button there will be an encoding drop down and the file's current encoding will be selected there.

like image 22
Itamar Mushkin Avatar answered Sep 21 '22 00:09

Itamar Mushkin