Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: conditional select using .loc with MultiIndex

I have read doc of Advanced indexing with hierarchical index where using .loc for MultiIndex is explained. Also this thread: Using .loc with a MultiIndex in pandas?

Still I don't see how select rows where (first index == some value) or (second index == some value)

Example:

import pandas as pd

index = pd.MultiIndex.from_arrays([['a', 'a', 'a', 'b', 'b', 'b'],
                                  ['a', 'b', 'c', 'a', 'b', 'c']],
                                  names=['i0', 'i1'])
df = pd.DataFrame({'x': [1,2,3,4,5,6], 'y': [6,5,4,3,2,1]}, index=index)

Is this DataFrame:

       x  y
i0 i1      
a  a   1  6
   b   2  5
   c   3  4
b  a   4  3
   b   5  2
   c   6  1

How can I get rows where i0 == 'b' or i1 == 'b'?

       x  y
i0 i1      
a  b   2  5
b  a   4  3
   b   5  2
   c   6  1
like image 208
Mr weasel Avatar asked Jun 25 '18 23:06

Mr weasel


People also ask

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.

What does .LOC return in pandas?

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Access group of values using labels. Single label. Note this returns the row as a Series.

How do I change multiple index to columns in pandas?

A multi-index dataframe has multi-level, or hierarchical indexing. We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.


2 Answers

I think the easier answer is to use the DataFrame.query function which allows you to query the multi-index by name as follows:

import pandas as pd
import numpy as np

index = pd.MultiIndex.from_arrays([list("aaabbb"),
                                  list("abcabc")],
                                  names=['i0', 'i1'])
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'y': [6, 5, 4, 3, 2, 1]}, index=index)


df.query('i0 == "b" | i1 == "b"')

returns:

       x  y
i0 i1      
a  b   2  5
b  a   4  3
   b   5  2
   c   6  1
like image 125
mmann1123 Avatar answered Sep 17 '22 22:09

mmann1123


Use get_level_values()

>>> mask = (df.index.get_level_values(0)=='b') | (df.index.get_level_values(1)=='b') 
>>> df[mask]  # same as df.loc[mask]


        x   y
i0  i1      
a   b   2   5
b   a   4   3
    b   5   2
    c   6   1
like image 20
rafaelc Avatar answered Sep 18 '22 22:09

rafaelc