Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String "contains"-slicing on Pandas MultiIndex

Tags:

python

pandas

How can I slice a MultiIndex by its string content? I.e. whether that particular index contains a certain string?

In [12]: df = pd.DataFrame({'a': ['a', 'ab', 'b'], 
                   'c': ['d', 'd', 'd'], 
                   'val': [1, 2 , 3]}).set_index(['a', 'c'])

In [13]: df

Out[13]:

val
a   c   
a   d   1
ab  d   2
b   d   3

In [15]: df.xs('a', level='a', drop_level=False)

Out[15]:

val
a   c   
a   d   1

In[16]: df.xs(contains('a'), level='a', drop_level=False)

Expected output:

Out[16]: 

a   c   
a   d   1
ab  d   2

Obviously that last bit is not possible.

  • How can this be done elegantly?
  • Can you do it case-insensitive in some way?
like image 811
salient Avatar asked Feb 05 '23 10:02

salient


1 Answers

Use boolean indexing with get_level_values and str.contains:

print (df.index.get_level_values('a'))
Index(['a', 'ab', 'b'], dtype='object', name='a'

print (df.index.get_level_values('a').str.contains('a'))
[ True  True False]

df1 = df[df.index.get_level_values('a').str.contains('a', case=False)]
print (df1)
      val
a  c     
a  d    1
ab d    2
like image 179
jezrael Avatar answered Feb 16 '23 03:02

jezrael