Slice pandas DataFrame by MultiIndex level or sublevel

Tags:

pandas

Inspired by this answer and the lack of an easy answer to this question I found myself writing a little syntactic sugar to make life easier to filter by MultiIndex level.

def _filter_series(x, level_name, filter_by):
    """
    Filter a pd.Series or pd.DataFrame x by `filter_by` on the MultiIndex level
    `level_name`

    Uses `pd.Index.get_level_values()` in the background. `filter_by` is either
    a string or an iterable.
    """
    if isinstance(x, pd.Series) or isinstance(x, pd.DataFrame):
        if type(filter_by) is str:
            filter_by = [filter_by]

        index = x.index.get_level_values(level_name).isin(filter_by)
        return x[index]
    else:
        print "Not a pandas object"

But if I know the pandas development team (and I'm starting to, slowly!) there's already a nice way to do this, and I just don't know what it is yet!

Am I right?

690

asked Apr 10 '14 11:04

2 Answers

I actually upvoted joris's answer... but unfortunately the refactoring he mentions has not happened in 0.14 and is not happening in 0.17 neither. So for the moment let me suggest a quick and dirty solution (obviously derived from Jeff's one):

def filter_by(df, constraints):
    """Filter MultiIndex by sublevels."""
    indexer = [constraints[name] if name in constraints else slice(None)
               for name in df.index.names]
    return df.loc[tuple(indexer)] if len(df.shape) == 1 else df.loc[tuple(indexer),]

pd.Series.filter_by = filter_by
pd.DataFrame.filter_by = filter_by

... to be used as

df.filter_by({'level_name' : value})

where value can be indeed a single value, but also a list, a slice...

(untested with Panels and higher dimension elements, but I do expect it to work)

182

answered Sep 19 '22 05:09

Pietro Battiston

This is very easy using the new multi-index slicers in master/0.14 (releasing soon), see here

There is an open issue to make this syntatically easier (its not hard to do), see here e.g something like this: df.loc[{ 'third' : ['C1','C3'] }] I think is reasonable

Here's how you can do it (requires master/0.14):

In [2]: def mklbl(prefix,n):
   ...:     return ["%s%s" % (prefix,i)  for i in range(n)]
   ...: 


In [11]: index = MultiIndex.from_product([mklbl('A',4),
mklbl('B',2),
mklbl('C',4),
mklbl('D',2)],names=['first','second','third','fourth'])

In [12]: columns = ['value']

In [13]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),index=index,columns=columns).sortlevel()

In [14]: df
Out[14]: 
                           value
first second third fourth       
A0    B0     C0    D0          0
                   D1          1
             C1    D0          2
                   D1          3
             C2    D0          4
                   D1          5
             C3    D0          6
                   D1          7
      B1     C0    D0          8
                   D1          9
             C1    D0         10
                   D1         11
             C2    D0         12
                   D1         13
             C3    D0         14
                   D1         15
A1    B0     C0    D0         16
                   D1         17
             C1    D0         18
                   D1         19
             C2    D0         20
                   D1         21
             C3    D0         22
                   D1         23
      B1     C0    D0         24
                   D1         25
             C1    D0         26
                   D1         27
             C2    D0         28
                   D1         29
             C3    D0         30
                   D1         31
A2    B0     C0    D0         32
                   D1         33
             C1    D0         34
                   D1         35
             C2    D0         36
                   D1         37
             C3    D0         38
                   D1         39
      B1     C0    D0         40
                   D1         41
             C1    D0         42
                   D1         43
             C2    D0         44
                   D1         45
             C3    D0         46
                   D1         47
A3    B0     C0    D0         48
                   D1         49
             C1    D0         50
                   D1         51
             C2    D0         52
                   D1         53
             C3    D0         54
                   D1         55
      B1     C0    D0         56
                   D1         57
             C1    D0         58
                   D1         59
                             ...

[64 rows x 1 columns]

Create an indexer across all of the levels, selecting all entries

In [15]: indexer = [slice(None)]*len(df.index.names)

Make the level we care about only have the entries we care about

In [16]: indexer[df.index.names.index('third')] = ['C1','C3']

Select it (its important that this is a tuple!)

In [18]: df.loc[tuple(indexer),:]
Out[18]: 
                           value
first second third fourth       
A0    B0     C1    D0          2
                   D1          3
             C3    D0          6
                   D1          7
      B1     C1    D0         10
                   D1         11
             C3    D0         14
                   D1         15
A1    B0     C1    D0         18
                   D1         19
             C3    D0         22
                   D1         23
      B1     C1    D0         26
                   D1         27
             C3    D0         30
                   D1         31
A2    B0     C1    D0         34
                   D1         35
             C3    D0         38
                   D1         39
      B1     C1    D0         42
                   D1         43
             C3    D0         46
                   D1         47
A3    B0     C1    D0         50
                   D1         51
             C3    D0         54
                   D1         55
      B1     C1    D0         58
                   D1         59
             C3    D0         62
                   D1         63

[32 rows x 1 columns]

answered Sep 20 '22 05:09

Jeff

Related questions
                            
                                Safely storing encrypted credentials in django
                            
                                OpenCV Python single (rather than multiple) blob tracking?
                            
                                clone process support in python
                            
                                How to perform time limited response download with python requests?
                            
                                How do I get the raw representation of a string in Python?
                            
                                Replacing the empty strings in a string
                            
                                Explain why numpy should not be imported from source directory
                            
                                Invalid argument exception in socket.accept() if I restart immediately after a previous run quit
                            
                                How does DropBox protect its python code? [closed]
                            
                                How do I get numpy.einsum to play well with sympy?
                            
                                Django - include app urls
                            
                                How to avoid global variables
                            
                                Detect from a running python script if the optimize flag is -O or -OO
                            
                                Resizing a single subplot in matplotlib
                            
                                A Python Segmentation Fault?
                            
                                How to wrap a column in a CAST operation
                            
                                Does networkx keep track of node depths?
                            
                                PyCharm. Getting the Project Dir in the "Run/Debug Configuration" Window
                            
                                Flask: Forking Environments
                            
                                Get text bounding box, independent of backend

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Slice pandas DataFrame by MultiIndex level or sublevel

Tags:

python

pandas

LondonRob

People also ask

2 Answers

Pietro Battiston

Jeff

Recent Activity

Donate For Us