Is there a way to encode the index of my dataframe? I have a dataframe where the index is the name of international conferences.
df2= pd.DataFrame(index=df_conf['Conference'], columns=['Citation1991','Citation1992'])
I keep getting:
KeyError: 'Leitf\xc3\xa4den der angewandten Informatik'
whenever my code references a foreign conference name with unknown ascii letters.
I tried:
df.at[x.encode("utf-8"), 'col1']
df.at[x.encode('ascii', 'ignore'), 'col']
Is there a way around it? I tried to see if I could encode the dataframe itself when creating it, but it doesn't seem I can do that either.
str. encode() function is used to encode character string in the Series/Index using indicated encoding. Equivalent to str.
This is a type of encoding and is used to solve the UnicodeDecodeError, while attempting to read a file in Python or Pandas. latin-1 is a single-byte encoding which uses the characters 0 through 127, so it can encode half as many characters as latin1.
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.
If you're not using csv, and you want to encode your string index, this is what worked for me:
df.index = df.index.str.encode('utf-8')
Setting up the encoding should be treated when reading the input file, using the option encoding
df = pd.read_csv('bibliography.csv', delimiter=',', encoding="utf-8")
or if the file uses BOM
,
df = pd.read_csv('bibliography.csv', delimiter=',', encoding="utf-8-sig")
Just put "u" in front of utf8 strings such that
df2= pd.DataFrame(index=df_conf[u'Conference'], columns=[u'Citation1991',u'Citation1992'])
It will work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With