I have a dataframe which looks like this:
import pandas as pd
import numpy as np
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame([[24, 13, 8, 9],
[11, 30, 7, 23],
[21, 31, 12, 30],
[ 2, 5, 19, 24],
[15, 18, 3, 16],
[ 2, 24, 28, 11],
[23, 9, 6, 12],
[29, 28, 11, 21]], index=arrays, columns=list('abcd'))
df
a b c d
bar one 24 13 8 9
two 11 30 7 23
baz one 21 31 12 30
two 2 5 19 24
foo one 15 18 3 16
two 2 24 28 11
qux one 23 9 6 12
two 29 28 11 21
I want to slice the dataframe such that the results contains all rows which have foo
as value for their first index and all the rows which have bar
as first level index and two
as second level index.
I.e. the resulting dataframe shoud look like this:
a b c d
bar two 11 30 7 23
foo one 15 18 3 16
two 2 24 28 11
One way to get this result is
pd.concat([df.loc[[('bar', 'two')],:], df.loc[('foo', slice(None)),:]])
but this feels like a very cumbersome way, there must be a more "pythonic" way..
You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.
from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.
So, the Hierarchy of indexes is formed that's why this is called Hierarchical indexing. We may sometimes need to make a column as an index, or we want to convert an index column into the normal column, so there is a pandas reset_index(inplace = True) function, which makes the index column the normal column.
To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.
query
to the rescue:
df.query('ilevel_0 == "foo" or (ilevel_0 == "bar" and ilevel_1 == "two")')
a b c d
bar two 11 30 7 23
foo one 15 18 3 16
two 2 24 28 11
xs
, loc
, etc all fail because your slicing across levels is not consistent.
You can use default slicing
l0 = df.index.get_level_values(0)
l1 = df.index.get_level_values(1)
cond = (l0 == "foo") | ((l0=="bar") & (l1=="two"))
df[cond]
Output
a b c d
bar two 11 30 7 23
foo one 15 18 3 16
two 2 24 28 11
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With