The latest version of Pandas supports multi-index slicers. However, one needs to know the integer location of the different levels to use them properly.
E.g. the following:
idx = pd.IndexSlice
dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
assumes that we know that the third row level is the one we want to index with C1 and C3, and that the second column level is the one we want to index with foo.
Sometimes I know the names of the levels but not their location in the multi-index. Is there a way to use multi-index slices in this case?
For example, say that I know what slices I want to apply on each level name, e.g. as a dictionary:
'level_name_1' -> ':'
'level_name_2' -> ':'
'level_name_3' -> ['C1', 'C3']
but that I don't know the position (depth) of these levels in the multi-index. Does Pandas a built-in indexing mechanism for this?
Can I still use pd.IndexSlice objects somehow if I know level names, but not their position?
PD: I know I could could use reset_index() and then just work with flat columns, but I would like to avoid resetting the index (even if temporarily). I could also use query, but query requires index names to be compatible with Python identifiers (e.g. no spaces, etc).
The closest I have seen for the above is:
df.xs('C1', level='foo')
where foo is the name of the level and C1 is the value of interest.
I know that xs supports multiple keys, e.g.:
df.xs(('one', 'bar'), level=('second', 'first'), axis=1)
but it does not support slices or ranges (like pd.IndexSlice does).
To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().
Hierarchical indexing is a method of creating structured group relationships in data. These hierarchical indexes, or MultiIndexes, are highly flexible and offer a range of options when performing complex data queries.
This is still an open issue for enhancement, see here. Its pretty straightforward to support this. pull-requests are welcome!
You can easily do this as a work-around:
In [11]: midx = pd.MultiIndex.from_product([list(range(3)),['a','b','c'],pd.date_range('20130101',periods=3)],names=['numbers','letters','dates'])
In [12]: midx.names.index('letters')
Out[12]: 1
In [13]: midx.names.index('dates')
Out[13]: 2
Here's a complete example
In [18]: df = DataFrame(np.random.randn(len(midx),1),index=midx)
In [19]: df
Out[19]:
0
numbers letters dates
0 a 2013-01-01 0.261092
2013-01-02 -1.267770
2013-01-03 0.008230
b 2013-01-01 -1.515866
2013-01-02 0.351942
2013-01-03 -0.245463
c 2013-01-01 -0.253103
2013-01-02 -0.385411
2013-01-03 -1.740821
1 a 2013-01-01 -0.108325
2013-01-02 -0.212350
2013-01-03 0.021097
b 2013-01-01 -1.922214
2013-01-02 -1.769003
2013-01-03 -0.594216
c 2013-01-01 -0.419775
2013-01-02 1.511700
2013-01-03 0.994332
2 a 2013-01-01 -0.020299
2013-01-02 -0.749474
2013-01-03 -1.478558
b 2013-01-01 -1.357671
2013-01-02 0.161185
2013-01-03 -0.658246
c 2013-01-01 -0.564796
2013-01-02 -0.333106
2013-01-03 -2.814611
This is your dict of level names -> slices
In [20]: slicers = { 'numbers' : slice(0,1), 'dates' : slice('20130102','20130103') }
This creates an indexer that is empty (selects everything)
In [21]: indexer = [ slice(None) ] * len(df.index.levels)
Add in your slicers
In [22]: for n, idx in slicers.items():
indexer[df.index.names.index(n)] = idx
And select (this has to be a tuple, but was a list to start as we had to modify it)
In [23]: df.loc[tuple(indexer),:]
Out[23]:
0
numbers letters dates
0 a 2013-01-02 -1.267770
2013-01-03 0.008230
b 2013-01-02 0.351942
2013-01-03 -0.245463
c 2013-01-02 -0.385411
2013-01-03 -1.740821
1 a 2013-01-02 -0.212350
2013-01-03 0.021097
b 2013-01-02 -1.769003
2013-01-03 -0.594216
c 2013-01-02 1.511700
2013-01-03 0.994332
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With