Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get value by multi-index with python pandas?

How can I get the value from a dataframe by its multi-index?

For example I have a dataframe mm:

np.random.seed(1)
mm = pd.DataFrame(np.random.randn(5,2))
mm['A'] = np.arange(5)
mm['B'] = np.arange(5,10)
mm.set_index(['A','B'], inplace=True)

print mm

        0         1
A B                    
0 5  1.624345 -0.611756
1 6 -0.528172 -1.072969
2 7  0.865408 -2.301539
3 8  1.744812 -0.761207
4 9  0.319039 -0.249370

I want to get the value where A = 2, B = 7, how can I do that?

Is it possible to write a function like get_value(mm, (2,7)), then I can get the following result:

2 7  0.865408 -2.301539
like image 625
xirururu Avatar asked Apr 08 '16 22:04

xirururu


People also ask

How do I select multiple indexes in pandas?

Using slicersYou can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

How does pandas handle multiple index columns?

A multi-index dataframe has multi-level, or hierarchical indexing. We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.

How do you get Groupby index in pandas?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.


2 Answers

Use mm.loc to select rows by label:

In [28]: row = mm.loc[2,7]; row
Out[28]: 
0    0.865408
1   -2.301539
Name: (2, 7), dtype: float64

In [40]: np.concatenate([row.name, row])
Out[40]: array([ 2.        ,  7.        ,  0.86540763, -2.3015387 ])

Since mm has a MultiIndex, each row label is expressed as a tuple (e.g. (2,7)). When there is no ambiguity, such as inside brackets, the parentheses can be dropped: mm.loc[2, 7] is equivalent to mm.loc[(2, 7)].


To get all rows where B=7, you could

  • use pd.IndexSlice:

    xs = pd.IndexSlice
    mm.loc[xs[:, 7], :]
    
  • or the mm.query method:

    mm.query('B==7')
    
  • or mm.index.get_loc_level with mm.loc:

    mask, idx = index.get_loc_level(7, level='B')
    mm.loc[mask]
    
  • or mm.index.get_loc_level with mm.iloc:

    mask, idx = index.get_loc_level(7, level='B')
    mm.iloc[idx]
    

Each of the expressions above return the DataFrame

            0         1
A B                    
2 7  0.865408 -2.301539
like image 87
unutbu Avatar answered Oct 26 '22 01:10

unutbu


This returns your selection as a dataframe:

>>> mm.loc[[(2, 7)]]
            0         1
A B                    
2 7  0.865408 -2.301539

To get the index and values:

>>> mm.loc[[(2, 7)]].reset_index().values.tolist()[0]
[2.0, 7.0, 0.8654076293246785, -2.3015386968802827]

To get all values where the second item is 7:

idx = pd.IndexSlice
>>> mm.loc[idx[:, 7], :]
            0         1
A B                    
2 7  0.865408 -2.301539
like image 21
Alexander Avatar answered Oct 26 '22 00:10

Alexander