I have a dataframe which looks like this:
import pandas as pd
import numpy as np
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame([[24, 13,  8,  9],
   [11, 30,  7, 23],
   [21, 31, 12, 30],
   [ 2,  5, 19, 24],
   [15, 18,  3, 16],
   [ 2, 24, 28, 11],
   [23,  9,  6, 12],
   [29, 28, 11, 21]], index=arrays, columns=list('abcd'))
df
          a   b   c   d
bar one  24  13   8   9
    two  11  30   7  23
baz one  21  31  12  30
    two   2   5  19  24
foo one  15  18   3  16
    two   2  24  28  11
qux one  23   9   6  12
    two  29  28  11  21
I want to slice the dataframe such that the results contains all rows which have foo as value for their first index and all the rows which have bar as first level index and two as second level index.
I.e. the resulting dataframe shoud look like this:
          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11
One way to get this result is
pd.concat([df.loc[[('bar', 'two')],:], df.loc[('foo', slice(None)),:]])
but this feels like a very cumbersome way, there must be a more "pythonic" way..
You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.
from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.
So, the Hierarchy of indexes is formed that's why this is called Hierarchical indexing. We may sometimes need to make a column as an index, or we want to convert an index column into the normal column, so there is a pandas reset_index(inplace = True) function, which makes the index column the normal column.
To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.
query to the rescue:
df.query('ilevel_0 == "foo" or (ilevel_0 == "bar" and ilevel_1 == "two")')
          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11
xs, loc, etc all fail because your slicing across levels is not consistent.
You can use default slicing
l0 = df.index.get_level_values(0)
l1 = df.index.get_level_values(1)
cond = (l0 == "foo") | ((l0=="bar") & (l1=="two"))
df[cond]
Output
        a   b   c   d
bar two 11  30  7   23
foo one 15  18  3   16
    two 2   24  28  11
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With