Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the index of a string value in a pandas DataFrame

Tags:

How can I identify which column(s) in my DataFrame contain a specific string 'foo'?

Sample DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[10,20,42], 'B':['foo','bar','blah'],'C':[3,4,5], 'D':['some','foo','thing']})

I want to find B and D here.

I can search for numbers:

If I'm looking for a number (e.g. 42) instead of a string, I can generate a boolean mask like this:

>>> ~(df.where(df==42)).isnull().all()

A     True
B    False
C    False
D    False
dtype: bool

but not strings:

>>> ~(df.where(df=='foo')).isnull().all()

TypeError: Could not compare ['foo'] with block values

I don't want to iterate over each column and row if possible (my actual data is much larger than this example). It feels like there should be a simple and efficient way.

How can I do this?

like image 674
Ben Avatar asked Sep 27 '17 16:09

Ben


1 Answers

One way with underlying array data -

df.columns[(df.values=='foo').any(0)].tolist()

Sample run -

In [209]: df
Out[209]: 
    A     B  C      D
0  10   foo  3   some
1  20   bar  4    foo
2  42  blah  5  thing

In [210]: df.columns[(df.values=='foo').any(0)].tolist()
Out[210]: ['B', 'D']

If you are looking for just the column-mask -

In [205]: (df.values=='foo').any(0)
Out[205]: array([False,  True, False,  True], dtype=bool)
like image 62
Divakar Avatar answered Oct 11 '22 13:10

Divakar