Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slicing a MultiIndex DataFrame with a condition based on the index [duplicate]

Tags:

python

pandas

I have a dataframe which looks like this:

import pandas as pd
import numpy as np

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame([[24, 13,  8,  9],
   [11, 30,  7, 23],
   [21, 31, 12, 30],
   [ 2,  5, 19, 24],
   [15, 18,  3, 16],
   [ 2, 24, 28, 11],
   [23,  9,  6, 12],
   [29, 28, 11, 21]], index=arrays, columns=list('abcd'))


df
          a   b   c   d
bar one  24  13   8   9
    two  11  30   7  23
baz one  21  31  12  30
    two   2   5  19  24
foo one  15  18   3  16
    two   2  24  28  11
qux one  23   9   6  12
    two  29  28  11  21

I want to slice the dataframe such that the results contains all rows which have foo as value for their first index and all the rows which have bar as first level index and two as second level index. I.e. the resulting dataframe shoud look like this:

          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11

One way to get this result is

pd.concat([df.loc[[('bar', 'two')],:], df.loc[('foo', slice(None)),:]])

but this feels like a very cumbersome way, there must be a more "pythonic" way..

like image 333
crs Avatar asked May 30 '18 15:05

crs


People also ask

How do you slice in MultiIndex?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.

What is hierarchical indexing and why is it required explain how pandas can be used for creating hierarchical indexes?

So, the Hierarchy of indexes is formed that's why this is called Hierarchical indexing. We may sometimes need to make a column as an index, or we want to convert an index column into the normal column, so there is a pandas reset_index(inplace = True) function, which makes the index column the normal column.

How do I convert MultiIndex to single index in pandas?

To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.


2 Answers

query to the rescue:

df.query('ilevel_0 == "foo" or (ilevel_0 == "bar" and ilevel_1 == "two")')

          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11

xs, loc, etc all fail because your slicing across levels is not consistent.

like image 135
cs95 Avatar answered Nov 07 '22 18:11

cs95


You can use default slicing

l0 = df.index.get_level_values(0)
l1 = df.index.get_level_values(1)
cond = (l0 == "foo") | ((l0=="bar") & (l1=="two"))
df[cond]

Output

        a   b   c   d
bar two 11  30  7   23
foo one 15  18  3   16
    two 2   24  28  11
like image 38
rafaelc Avatar answered Nov 07 '22 20:11

rafaelc