Pandas multi-index slices for level names

Tags:

pandas

The latest version of Pandas supports multi-index slicers. However, one needs to know the integer location of the different levels to use them properly.

E.g. the following:

idx = pd.IndexSlice
dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]

assumes that we know that the third row level is the one we want to index with C1 and C3, and that the second column level is the one we want to index with foo.

Sometimes I know the names of the levels but not their location in the multi-index. Is there a way to use multi-index slices in this case?

For example, say that I know what slices I want to apply on each level name, e.g. as a dictionary:

'level_name_1' -> ':' 
'level_name_2' -> ':'
'level_name_3' -> ['C1', 'C3']

but that I don't know the position (depth) of these levels in the multi-index. Does Pandas a built-in indexing mechanism for this?

Can I still use pd.IndexSlice objects somehow if I know level names, but not their position?

PD: I know I could could use reset_index() and then just work with flat columns, but I would like to avoid resetting the index (even if temporarily). I could also use query, but query requires index names to be compatible with Python identifiers (e.g. no spaces, etc).

The closest I have seen for the above is:

df.xs('C1', level='foo')

where foo is the name of the level and C1 is the value of interest.

I know that xs supports multiple keys, e.g.:

df.xs(('one', 'bar'), level=('second', 'first'), axis=1)

but it does not support slices or ranges (like pd.IndexSlice does).

703

asked Jun 09 '14 18:06

Amelio Vazquez-Reina

1 Answers

This is still an open issue for enhancement, see here. Its pretty straightforward to support this. pull-requests are welcome!

You can easily do this as a work-around:

In [11]: midx = pd.MultiIndex.from_product([list(range(3)),['a','b','c'],pd.date_range('20130101',periods=3)],names=['numbers','letters','dates'])

In [12]: midx.names.index('letters')
Out[12]: 1

In [13]: midx.names.index('dates')
Out[13]: 2

Here's a complete example

In [18]: df = DataFrame(np.random.randn(len(midx),1),index=midx)

In [19]: df
Out[19]: 
                                   0
numbers letters dates               
0       a       2013-01-01  0.261092
                2013-01-02 -1.267770
                2013-01-03  0.008230
        b       2013-01-01 -1.515866
                2013-01-02  0.351942
                2013-01-03 -0.245463
        c       2013-01-01 -0.253103
                2013-01-02 -0.385411
                2013-01-03 -1.740821
1       a       2013-01-01 -0.108325
                2013-01-02 -0.212350
                2013-01-03  0.021097
        b       2013-01-01 -1.922214
                2013-01-02 -1.769003
                2013-01-03 -0.594216
        c       2013-01-01 -0.419775
                2013-01-02  1.511700
                2013-01-03  0.994332
2       a       2013-01-01 -0.020299
                2013-01-02 -0.749474
                2013-01-03 -1.478558
        b       2013-01-01 -1.357671
                2013-01-02  0.161185
                2013-01-03 -0.658246
        c       2013-01-01 -0.564796
                2013-01-02 -0.333106
                2013-01-03 -2.814611

This is your dict of level names -> slices

In [20]: slicers = { 'numbers' : slice(0,1), 'dates' : slice('20130102','20130103') }

This creates an indexer that is empty (selects everything)

In [21]: indexer = [ slice(None) ] * len(df.index.levels)

Add in your slicers

In [22]: for n, idx in slicers.items():
              indexer[df.index.names.index(n)] = idx

And select (this has to be a tuple, but was a list to start as we had to modify it)

In [23]: df.loc[tuple(indexer),:]
Out[23]: 
                                   0
numbers letters dates               
0       a       2013-01-02 -1.267770
                2013-01-03  0.008230
        b       2013-01-02  0.351942
                2013-01-03 -0.245463
        c       2013-01-02 -0.385411
                2013-01-03 -1.740821
1       a       2013-01-02 -0.212350
                2013-01-03  0.021097
        b       2013-01-02 -1.769003
                2013-01-03 -0.594216
        c       2013-01-02  1.511700
                2013-01-03  0.994332

117

answered Oct 23 '22 11:10

Jeff

Related questions
                            
                                What does "unsupported operand type(s) for -: 'int' and 'tuple'" means?
                            
                                In Python what is the significance of parentheses, in isolation, surrounding a module name?
                            
                                matplotlib says it needs libpng15, but I have libpng16
                            
                                Paramiko Python: IOError: [Errno 13] Permission denied
                            
                                Python unittest: Run multiple assertions in a loop without failing at first one, but continue
                            
                                Most idiomatic way to provide default value in python?
                            
                                Error Handling: Boto: [Error 104] Connection Reset by Peer
                            
                                AttributeError: 'Settings' object has no attribute 'ROOT_URLCONF' on Heroku
                            
                                QMetaObject::invokeMethod doesn't find methods with parameters
                            
                                Can't enable debug mode in Flask
                            
                                Python Sql Alchemy - How to jsonify a class object result from a database query
                            
                                How do I get the correct date format string for a given locale without setting that locale program-wide in Python?
                            
                                Making figure transparent with colored background
                            
                                Pyinstaller QtCore Module import error
                            
                                Subclass of numpy ndarray doesn't work as expected
                            
                                Get indices of intersecting rows of Numpy 2d Array
                            
                                How to use encrypted RSA private key with PyCrypto?
                            
                                Handling escaped quotes with Python's csv.reader
                            
                                Speeding up a numpy loop in python?
                            
                                Django mod_wsgi: Exception occurred processing wsgi script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With