The latest version of Pandas supports multi-index slicers. However, one needs to know the integer location of the different levels to use them properly.
E.g. the following:
idx = pd.IndexSlice
dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
assumes that we know that the third row level is the one we want to index with C1
and C3
, and that the second column level is the one we want to index with foo
.
Sometimes I know the names of the levels but not their location in the multi-index. Is there a way to use multi-index slices in this case?
For example, say that I know what slices I want to apply on each level name, e.g. as a dictionary:
'level_name_1' -> ':'
'level_name_2' -> ':'
'level_name_3' -> ['C1', 'C3']
but that I don't know the position (depth) of these levels in the multi-index. Does Pandas a built-in indexing mechanism for this?
Can I still use pd.IndexSlice
objects somehow if I know level names, but not their position?
PD: I know I could could use reset_index()
and then just work with flat columns, but I would like to avoid resetting the index (even if temporarily). I could also use query
, but query
requires index names to be compatible with Python identifiers (e.g. no spaces, etc).
The closest I have seen for the above is:
df.xs('C1', level='foo')
where foo
is the name of the level and C1
is the value of interest.
I know that xs
supports multiple keys, e.g.:
df.xs(('one', 'bar'), level=('second', 'first'), axis=1)
but it does not support slices or ranges (like pd.IndexSlice
does).
To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().
Hierarchical indexing is a method of creating structured group relationships in data. These hierarchical indexes, or MultiIndexes, are highly flexible and offer a range of options when performing complex data queries.
This is still an open issue for enhancement, see here. Its pretty straightforward to support this. pull-requests are welcome!
You can easily do this as a work-around:
In [11]: midx = pd.MultiIndex.from_product([list(range(3)),['a','b','c'],pd.date_range('20130101',periods=3)],names=['numbers','letters','dates'])
In [12]: midx.names.index('letters')
Out[12]: 1
In [13]: midx.names.index('dates')
Out[13]: 2
Here's a complete example
In [18]: df = DataFrame(np.random.randn(len(midx),1),index=midx)
In [19]: df
Out[19]:
0
numbers letters dates
0 a 2013-01-01 0.261092
2013-01-02 -1.267770
2013-01-03 0.008230
b 2013-01-01 -1.515866
2013-01-02 0.351942
2013-01-03 -0.245463
c 2013-01-01 -0.253103
2013-01-02 -0.385411
2013-01-03 -1.740821
1 a 2013-01-01 -0.108325
2013-01-02 -0.212350
2013-01-03 0.021097
b 2013-01-01 -1.922214
2013-01-02 -1.769003
2013-01-03 -0.594216
c 2013-01-01 -0.419775
2013-01-02 1.511700
2013-01-03 0.994332
2 a 2013-01-01 -0.020299
2013-01-02 -0.749474
2013-01-03 -1.478558
b 2013-01-01 -1.357671
2013-01-02 0.161185
2013-01-03 -0.658246
c 2013-01-01 -0.564796
2013-01-02 -0.333106
2013-01-03 -2.814611
This is your dict of level names -> slices
In [20]: slicers = { 'numbers' : slice(0,1), 'dates' : slice('20130102','20130103') }
This creates an indexer that is empty (selects everything)
In [21]: indexer = [ slice(None) ] * len(df.index.levels)
Add in your slicers
In [22]: for n, idx in slicers.items():
indexer[df.index.names.index(n)] = idx
And select (this has to be a tuple, but was a list to start as we had to modify it)
In [23]: df.loc[tuple(indexer),:]
Out[23]:
0
numbers letters dates
0 a 2013-01-02 -1.267770
2013-01-03 0.008230
b 2013-01-02 0.351942
2013-01-03 -0.245463
c 2013-01-02 -0.385411
2013-01-03 -1.740821
1 a 2013-01-02 -0.212350
2013-01-03 0.021097
b 2013-01-02 -1.769003
2013-01-03 -0.594216
c 2013-01-02 1.511700
2013-01-03 0.994332
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With