Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`.loc` and `.iloc` with MultiIndex'd DataFrame

When indexing a MultiIndex-ed DataFrame, it seems like .iloc assumes you're referencing the "inner level" of the index while .loc looks at the outer level.

For example:

np.random.seed(123)
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
idx = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=idx)

# .loc looks at the outer index:

print(df.loc['qux'])
# df.loc['two'] would throw KeyError
              0        1        2        3
second                                    
one    -1.25388 -0.63775  0.90711 -1.42868
two    -0.14007 -0.86175 -0.25562 -2.79859

# while .iloc looks at the inner index:

print(df.iloc[-1])
0   -0.14007
1   -0.86175
2   -0.25562
3   -2.79859
Name: (qux, two), dtype: float64

Two questions:

Firstly, why is this? Is it a deliberate design decision?

Secondly, can I use .iloc to reference the outer level of the index, to yield the result below? I'm aware I could first find the last member of the index with get_level_values and then .loc-index with that, but wandering if it can be done more directly, either with funky .iloc syntax or some existing function designed specifically for the case.

# df.iloc[-1]
qux   one     0.89071  1.75489  1.49564  1.06939
      two    -0.77271  0.79486  0.31427 -1.32627
like image 392
Brad Solomon Avatar asked Aug 30 '17 18:08

Brad Solomon


2 Answers

Yes, this is a deliberate design decision:

.iloc is a strict positional indexer, it does not regard the structure at all, only the first actual behavior. ... .loc does take into account the level behavior. [emphasis added]

So the desired result given in the question is not possible in a flexible manner with .iloc. The closest workaround, used in several similar questions, is

print(df.loc[[df.index.get_level_values(0)[-1]]])
                    0        1        2        3
first second                                    
qux   one    -1.25388 -0.63775  0.90711 -1.42868
      two    -0.14007 -0.86175 -0.25562 -2.79859

Using double brackets will retain the first index level.

like image 193
Brad Solomon Avatar answered Sep 22 '22 02:09

Brad Solomon


You can use:

df.iloc[[6, 7], :]
Out[1]:
                     0         1         2         3
first second
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589

Where [6, 7] correspond to the actual row indexes of these lines, as you can see below:

df.reset_index()
Out[]:
  first second         0         1         2         3
0   bar    one -1.085631  0.997345  0.282978 -1.506295
1   bar    two -0.578600  1.651437 -2.426679 -0.428913
2   baz    one  1.265936 -0.866740 -0.678886 -0.094709
3   baz    two  1.491390 -0.638902 -0.443982 -0.434351
4   foo    one  2.205930  2.186786  1.004054  0.386186
5   foo    two  0.737369  1.490732 -0.935834  1.175829
6   qux    one -1.253881 -0.637752  0.907105 -1.428681
7   qux    two -0.140069 -0.861755 -0.255619 -2.798589

This also works with df.iloc[[-2, -1], :] or df.iloc[range(-2, 0), :].


EDIT: Turning it into a more generic solution

Then it is possible to get a generic function:

def multindex_iloc(df, index):
    label = df.index.levels[0][index]
    return df.iloc[df.index.get_loc(label)]

multiindex_loc(df, -1)
Out[]:
                     0         1         2         3
first second
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589


multiindex_loc(df, 2)
Out[]:
                     0         1         2         3
first second
foo   one     2.205930  2.186786  1.004054  0.386186
      two     0.737369  1.490732 -0.935834  1.175829
like image 45
FabienP Avatar answered Sep 22 '22 02:09

FabienP