Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas `loc` throw `KeyError` with column name?

Tags:

python

pandas

I have a data frame that is given this initial construct:

df_data = pd.DataFrame(columns=['name','date','c1','c2']).set_index(['name','date'])

I then have code to fill this frame from a data base. I can print some or all of the frame and get a sensible result. Something like:

print df_data.c1.head(3)

name date
Joe  2019-01-01 234324
     2019-01-02 4565
     2019-01-03 573
Name: c1, dtype: object

After filling from the data base, I have various analysis calculations that try to access the data using loc as, for example, df_data.loc['Joe', 'c1'] I expect to get a result from from that with date for an index and the values of column c1, where the "name" part of the multiindex has been selected down to 'Joe'. Something like:

print df_data.loc['Joe', 'c1']

date
2019-01-01 234324
2019-01-02 4565
2019-01-03 573
Name: c1, type: object

I've run this three times, filling the frame with different ranges of date. Two of the three work as expected and described above. In the third, I get KeyError: ('Joe', 'c1') for df_data.loc['Joe', 'c1'] but, even in this "broken" case, I get a perfectly nice result for df_data.loc['Joe'].c1, which I think should give the same answer in this case. I can also print the entire frame df_data and get a perfectly sensible result. I interpret the KeyError here to mean that Pandas thinks that c1 should be in the index rather than it giving a column name.

I cannot reproduce this in a stand-alone example as, for reasons I cannot understand, the result seems to depend on the data in the frame rather than structure of the frame. (Same structure "works" for two of three cases.) So specific questions:

  • Why or under what circumstances would the syntax loc['Joe', 'c1'] cause c1 to be treated as part of the key instead of a column name? (Whatever other error I may have, I don't see where the second argument here should be interpreted as part of the key under any documented scenario, e.g. I do not have something like loc[('Joe','c1')].)
  • Are there known or documented cases where something about the data in the frame could cause such a change in how the data access call is interpreted?
like image 703
Brick Avatar asked Nov 15 '22 21:11

Brick


1 Answers

Use tuple notation: df_data.loc[('Joe', 'c1')]. See: https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-indexing-with-hierarchical-index

like image 179
Mose Wintner Avatar answered Dec 19 '22 11:12

Mose Wintner