Slicing a MultiIndex DataFrame with a condition based on the index [duplicate]

Tags:

pandas

I have a dataframe which looks like this:

import pandas as pd
import numpy as np

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame([[24, 13,  8,  9],
   [11, 30,  7, 23],
   [21, 31, 12, 30],
   [ 2,  5, 19, 24],
   [15, 18,  3, 16],
   [ 2, 24, 28, 11],
   [23,  9,  6, 12],
   [29, 28, 11, 21]], index=arrays, columns=list('abcd'))


df
          a   b   c   d
bar one  24  13   8   9
    two  11  30   7  23
baz one  21  31  12  30
    two   2   5  19  24
foo one  15  18   3  16
    two   2  24  28  11
qux one  23   9   6  12
    two  29  28  11  21

I want to slice the dataframe such that the results contains all rows which have foo as value for their first index and all the rows which have bar as first level index and two as second level index. I.e. the resulting dataframe shoud look like this:

          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11

One way to get this result is

pd.concat([df.loc[[('bar', 'two')],:], df.loc[('foo', slice(None)),:]])

but this feels like a very cumbersome way, there must be a more "pythonic" way..

333

asked May 30 '18 15:05

crs

2 Answers

query to the rescue:

df.query('ilevel_0 == "foo" or (ilevel_0 == "bar" and ilevel_1 == "two")')

          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11

xs, loc, etc all fail because your slicing across levels is not consistent.

135

answered Nov 07 '22 18:11

cs95

You can use default slicing

l0 = df.index.get_level_values(0)
l1 = df.index.get_level_values(1)
cond = (l0 == "foo") | ((l0=="bar") & (l1=="two"))
df[cond]

Output

        a   b   c   d
bar two 11  30  7   23
foo one 15  18  3   16
    two 2   24  28  11

answered Nov 07 '22 20:11

rafaelc

Related questions
                            
                                PEP 3106 suggests slower way? Why?
                            
                                Parsing elements from list of list of strings
                            
                                Find period of a signal out of the FFT
                            
                                What is the recommended way to serialize a collection of spaCy Docs?
                            
                                python 'module' object is not callable when calling a function
                            
                                get-pip.py broken on Windows 10
                            
                                OpenCV Masking Image - error: (-215) (mtype == 0 || mtype == 1) && _mask.sameSize(*psrc1) in function cv::binary_op
                            
                                Add labels to Seaborn bivariate KDE plot
                            
                                Anaphora resolution in stanford-nlp using python
                            
                                How to initialize variables defined in tensorflow function?
                            
                                How to find an optimum number of processes in GridSearchCV( ..., n_jobs = ... )?
                            
                                NumPy: Where in the source code are `arange` and `array` functions defined?
                            
                                How to replace accents in a column of a pandas dataframe
                            
                                Django aggregate(sum error
                            
                                Python set operations - complement union of set
                            
                                Match words that don't start with a certain letter using regex
                            
                                cumsum() on multi-index pandas dataframe
                            
                                How is int.from_bytes() calculated?
                            
                                Tricky slicing specifications on business-day datetimeindex
                            
                                TypeError: Missing one required positional argument

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With