Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe with MultiIndex: check if string is contained in index level

Tags:

python

pandas

Let's say I have a multi-indexed pandas dataframe that looks like the following one, taken from the documentation.

import numpy as np
import pandas as pd

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]

df = pd.DataFrame(np.random.randn(8, 4), index=arrays)

Which looks like this:

                0         1         2         3
bar one -0.096648 -0.080298  0.859359 -0.030288
    two  0.043107 -0.431791  1.923893 -1.544845
baz one  0.639951 -0.008833 -0.227000  0.042315
    two  0.705281  0.446257 -1.108522  0.471676
foo one -0.579483 -2.261138 -0.826789  1.543524
    two -0.358526  1.416211  1.589617  0.284130
qux one  0.498149 -0.296404  0.127512 -0.224526
    two -0.286687 -0.040473  1.443701  1.025008

Now I only want the rows where "ne" is contained in second level of the MultiIndex.

Is there any way to slice the MultiIndex for (partly) contained strings?

like image 760
Cord Kaldemeyer Avatar asked Jan 13 '16 15:01

Cord Kaldemeyer


People also ask

How do you check if a value is in a pandas index?

To check if a value exists in the Index of a Pandas DataFrame, use the in keyword on the index property.

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.

How do you check if a string contains a substring in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .


1 Answers

You can apply a mask like:

df = df.iloc[df.index.get_level_values(1).str.contains('ne')]

which returns:

bar one -0.143200  0.523617  0.376458 -2.091154
baz one -0.198220  1.234587 -0.232862 -0.510039
foo one -0.426127  0.594426  0.457331 -0.459682
qux one -0.875160 -0.157073 -0.540459 -1.792235

EDIT: It is possible also applying a logical mask on multiple levels, e.g.:

df = df.iloc[(df.index.get_level_values(0).str.contains('ba')) | (df.index.get_level_values(1).str.contains('ne'))]

returns:

bar one  0.620279  1.525277  0.379649 -0.032608
    two  0.465240 -0.190038  0.795730  1.720368
baz one  0.986828 -0.080394 -0.303319  0.747483
    two  0.487534  1.597006  0.114551  0.299502
foo one -0.085700  0.112433  0.704043  0.264280
qux one -0.291758 -1.071669  0.794354 -1.805530
like image 72
Fabio Lamanna Avatar answered Oct 02 '22 15:10

Fabio Lamanna