Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MultiIndex Slicing requires the index to be fully lexsorted

Tags:

python

pandas

I have a data frame with index (year, foo), where I would like to select the X largest observations of foo where year == someYear.

My approach was

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[pd.IndexSlice[2002, :10], :]

but I get

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

I tried different variants of sorting (e.g. ascending = [0, 0]), but they all resulted in some sort of error.

If I only wanted the xth row, I could df.groupby(level=[0]).nth(x) after sorting, but since I want a set of rows, that doesn't feel quite efficient.

What's the best way to select these rows? Some data to play with:

                   rank_int  rank
year foo                         
2015 1.381845             2   320
     1.234795             2   259
     1.148488           199     2
     0.866704             2   363
     0.738022             2   319
like image 584
FooBar Avatar asked Oct 05 '16 14:10

FooBar


People also ask

How do you slice in MultiIndex?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

How convert MultiIndex to columns in pandas?

A multi-index dataframe has multi-level, or hierarchical indexing. We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.


2 Answers

ascending should be a boolean or a list of booleans, not a list of integers. Try sorting this way:

df.sort_index(ascending=True, inplace=True)

like image 89
ASGM Avatar answered Sep 19 '22 14:09

ASGM


Firstly you should do sorting like this:

df.sort_index(level=['year','foo'], ascending=[1, 0], inplace=True)

It should fix the KeyError. But df.loc[pd.IndexSlice[2002, :10], :] won't give you the result you are expecting. The loc function is not iloc and it will try to find in foo indexes 0,1..9. The secondary levels of Multiindex do not support iloc, I would suggest using groupby. If you already have this multiindex you should do:

df.reset_index()
df = df.sort_values(by=['year','foo'],ascending=[True,False])
df.groupby('year').head(10)

If you need n entries with the least foo you can use tail(n). If you need, say, the first, third and fifth entries, you can use nth([0,2,4]) as you mentioned in the question. I think it's the most efficient way one could do it.

like image 27
Danila Savenkov Avatar answered Sep 21 '22 14:09

Danila Savenkov