How can I get the value from a dataframe by its multi-index?
For example I have a dataframe mm
:
np.random.seed(1)
mm = pd.DataFrame(np.random.randn(5,2))
mm['A'] = np.arange(5)
mm['B'] = np.arange(5,10)
mm.set_index(['A','B'], inplace=True)
print mm
0 1
A B
0 5 1.624345 -0.611756
1 6 -0.528172 -1.072969
2 7 0.865408 -2.301539
3 8 1.744812 -0.761207
4 9 0.319039 -0.249370
I want to get the value where A = 2, B = 7, how can I do that?
Is it possible to write a function like get_value(mm, (2,7))
, then I can get the following result:
2 7 0.865408 -2.301539
Using slicersYou can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.
A multi-index dataframe has multi-level, or hierarchical indexing. We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
Use mm.loc
to select rows by label:
In [28]: row = mm.loc[2,7]; row
Out[28]:
0 0.865408
1 -2.301539
Name: (2, 7), dtype: float64
In [40]: np.concatenate([row.name, row])
Out[40]: array([ 2. , 7. , 0.86540763, -2.3015387 ])
Since mm
has a MultiIndex, each row label is expressed as a tuple (e.g. (2,7)
). When there is no ambiguity, such as inside brackets, the parentheses can be dropped: mm.loc[2, 7]
is equivalent to mm.loc[(2, 7)]
.
To get all rows where B=7
, you could
use
pd.IndexSlice
:
xs = pd.IndexSlice
mm.loc[xs[:, 7], :]
or the mm.query
method:
mm.query('B==7')
or mm.index.get_loc_level
with mm.loc
:
mask, idx = index.get_loc_level(7, level='B')
mm.loc[mask]
or mm.index.get_loc_level
with mm.iloc
:
mask, idx = index.get_loc_level(7, level='B')
mm.iloc[idx]
Each of the expressions above return the DataFrame
0 1
A B
2 7 0.865408 -2.301539
This returns your selection as a dataframe:
>>> mm.loc[[(2, 7)]]
0 1
A B
2 7 0.865408 -2.301539
To get the index and values:
>>> mm.loc[[(2, 7)]].reset_index().values.tolist()[0]
[2.0, 7.0, 0.8654076293246785, -2.3015386968802827]
To get all values where the second item is 7:
idx = pd.IndexSlice
>>> mm.loc[idx[:, 7], :]
0 1
A B
2 7 0.865408 -2.301539
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With