Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas multi-index slices for level names

Tags:

python

pandas

The latest version of Pandas supports multi-index slicers. However, one needs to know the integer location of the different levels to use them properly.

E.g. the following:

idx = pd.IndexSlice
dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]

assumes that we know that the third row level is the one we want to index with C1 and C3, and that the second column level is the one we want to index with foo.

Sometimes I know the names of the levels but not their location in the multi-index. Is there a way to use multi-index slices in this case?

For example, say that I know what slices I want to apply on each level name, e.g. as a dictionary:

'level_name_1' -> ':' 
'level_name_2' -> ':'
'level_name_3' -> ['C1', 'C3']

but that I don't know the position (depth) of these levels in the multi-index. Does Pandas a built-in indexing mechanism for this?

Can I still use pd.IndexSlice objects somehow if I know level names, but not their position?

PD: I know I could could use reset_index() and then just work with flat columns, but I would like to avoid resetting the index (even if temporarily). I could also use query, but query requires index names to be compatible with Python identifiers (e.g. no spaces, etc).


The closest I have seen for the above is:

df.xs('C1', level='foo')

where foo is the name of the level and C1 is the value of interest.

I know that xs supports multiple keys, e.g.:

df.xs(('one', 'bar'), level=('second', 'first'), axis=1)

but it does not support slices or ranges (like pd.IndexSlice does).

like image 703
Amelio Vazquez-Reina Avatar asked Jun 09 '14 18:06

Amelio Vazquez-Reina


People also ask

How do I create a hierarchical index in pandas?

To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().

What is hierarchical index?

Hierarchical indexing is a method of creating structured group relationships in data. These hierarchical indexes, or MultiIndexes, are highly flexible and offer a range of options when performing complex data queries.


1 Answers

This is still an open issue for enhancement, see here. Its pretty straightforward to support this. pull-requests are welcome!

You can easily do this as a work-around:

In [11]: midx = pd.MultiIndex.from_product([list(range(3)),['a','b','c'],pd.date_range('20130101',periods=3)],names=['numbers','letters','dates'])

In [12]: midx.names.index('letters')
Out[12]: 1

In [13]: midx.names.index('dates')
Out[13]: 2

Here's a complete example

In [18]: df = DataFrame(np.random.randn(len(midx),1),index=midx)

In [19]: df
Out[19]: 
                                   0
numbers letters dates               
0       a       2013-01-01  0.261092
                2013-01-02 -1.267770
                2013-01-03  0.008230
        b       2013-01-01 -1.515866
                2013-01-02  0.351942
                2013-01-03 -0.245463
        c       2013-01-01 -0.253103
                2013-01-02 -0.385411
                2013-01-03 -1.740821
1       a       2013-01-01 -0.108325
                2013-01-02 -0.212350
                2013-01-03  0.021097
        b       2013-01-01 -1.922214
                2013-01-02 -1.769003
                2013-01-03 -0.594216
        c       2013-01-01 -0.419775
                2013-01-02  1.511700
                2013-01-03  0.994332
2       a       2013-01-01 -0.020299
                2013-01-02 -0.749474
                2013-01-03 -1.478558
        b       2013-01-01 -1.357671
                2013-01-02  0.161185
                2013-01-03 -0.658246
        c       2013-01-01 -0.564796
                2013-01-02 -0.333106
                2013-01-03 -2.814611

This is your dict of level names -> slices

In [20]: slicers = { 'numbers' : slice(0,1), 'dates' : slice('20130102','20130103') }

This creates an indexer that is empty (selects everything)

In [21]: indexer = [ slice(None) ] * len(df.index.levels)

Add in your slicers

In [22]: for n, idx in slicers.items():
              indexer[df.index.names.index(n)] = idx

And select (this has to be a tuple, but was a list to start as we had to modify it)

In [23]: df.loc[tuple(indexer),:]
Out[23]: 
                                   0
numbers letters dates               
0       a       2013-01-02 -1.267770
                2013-01-03  0.008230
        b       2013-01-02  0.351942
                2013-01-03 -0.245463
        c       2013-01-02 -0.385411
                2013-01-03 -1.740821
1       a       2013-01-02 -0.212350
                2013-01-03  0.021097
        b       2013-01-02 -1.769003
                2013-01-03 -0.594216
        c       2013-01-02  1.511700
                2013-01-03  0.994332
like image 117
Jeff Avatar answered Oct 23 '22 11:10

Jeff