Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas KeyError: value not in index

I have the following code,

df = pd.read_csv(CsvFileName)  p = df.pivot_table(index=['Hour'], columns='DOW', values='Changes', aggfunc=np.mean).round(0) p.fillna(0, inplace=True)  p[["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]] = p[["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]].astype(int) 

It has always been working until the csv file doesn't have enough coverage (of all week days). For e.g., with the following .csv file,

DOW,Hour,Changes 4Wed,01,237 3Tue,07,2533 1Sun,01,240 3Tue,12,4407 1Sun,09,2204 1Sun,01,240 1Sun,01,241 1Sun,01,241 3Tue,11,662 4Wed,01,4 2Mon,18,4737 1Sun,15,240 2Mon,02,4 6Fri,01,1 1Sun,01,240 2Mon,19,2300 2Mon,19,2532 

I'll get the following error:

KeyError: "['5Thu' '7Sat'] not in index" 

It seems to have a very easy fix, but I'm just too new to Python to know how to fix it.

like image 765
xpt Avatar asked Jul 19 '16 15:07

xpt


People also ask

How do I fix Pandas KeyError?

We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.

Why is Pandas not recognizing column name?

Typically this error occurs when you simply misspell a column names or include an accidental space before or after the column name.

How do I reindex a data frame?

One can reindex a single column or multiple columns by using reindex() method and by specifying the axis we want to reindex. Default values in the new index that are not present in the dataframe are assigned NaN.


2 Answers

Use reindex to get all columns you need. It'll preserve the ones that are already there and put in empty columns otherwise.

p = p.reindex(columns=['1Sun', '2Mon', '3Tue', '4Wed', '5Thu', '6Fri', '7Sat']) 

So, your entire code example should look like this:

df = pd.read_csv(CsvFileName)  p = df.pivot_table(index=['Hour'], columns='DOW', values='Changes', aggfunc=np.mean).round(0) p.fillna(0, inplace=True)  columns = ["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"] p = p.reindex(columns=columns) p[columns] = p[columns].astype(int) 
like image 127
piRSquared Avatar answered Sep 24 '22 15:09

piRSquared


I had a very similar issue. I got the same error because the csv contained spaces in the header. My csv contained a header "Gender " and I had it listed as:

[['Gender']] 

If it's easy enough for you to access your csv, you can use the excel formula trim() to clip any spaces of the cells.

or remove it like this

df.columns = df.columns.to_series().apply(lambda x: x.strip())

like image 39
ILikeWhiskey Avatar answered Sep 22 '22 15:09

ILikeWhiskey