I have read doc of Advanced indexing with hierarchical index where using .loc
for MultiIndex
is explained. Also this thread: Using .loc with a MultiIndex in pandas?
Still I don't see how select rows where (first index == some value) or (second index == some value)
Example:
import pandas as pd
index = pd.MultiIndex.from_arrays([['a', 'a', 'a', 'b', 'b', 'b'],
['a', 'b', 'c', 'a', 'b', 'c']],
names=['i0', 'i1'])
df = pd.DataFrame({'x': [1,2,3,4,5,6], 'y': [6,5,4,3,2,1]}, index=index)
Is this DataFrame:
x y
i0 i1
a a 1 6
b 2 5
c 3 4
b a 4 3
b 5 2
c 6 1
How can I get rows where i0 == 'b' or i1 == 'b'
?
x y
i0 i1
a b 2 5
b a 4 3
b 5 2
c 6 1
from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.
Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Access group of values using labels. Single label. Note this returns the row as a Series.
A multi-index dataframe has multi-level, or hierarchical indexing. We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.
I think the easier answer is to use the DataFrame.query
function which allows you to query the multi-index by name as follows:
import pandas as pd
import numpy as np
index = pd.MultiIndex.from_arrays([list("aaabbb"),
list("abcabc")],
names=['i0', 'i1'])
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'y': [6, 5, 4, 3, 2, 1]}, index=index)
df.query('i0 == "b" | i1 == "b"')
returns:
x y
i0 i1
a b 2 5
b a 4 3
b 5 2
c 6 1
Use get_level_values()
>>> mask = (df.index.get_level_values(0)=='b') | (df.index.get_level_values(1)=='b')
>>> df[mask] # same as df.loc[mask]
x y
i0 i1
a b 2 5
b a 4 3
b 5 2
c 6 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With