How can I identify which column(s) in my DataFrame contain a specific string 'foo'
?
>>> import pandas as pd
>>> df = pd.DataFrame({'A':[10,20,42], 'B':['foo','bar','blah'],'C':[3,4,5], 'D':['some','foo','thing']})
I want to find B
and D
here.
If I'm looking for a number (e.g. 42) instead of a string, I can generate a boolean mask like this:
>>> ~(df.where(df==42)).isnull().all()
A True
B False
C False
D False
dtype: bool
>>> ~(df.where(df=='foo')).isnull().all()
TypeError: Could not compare ['foo'] with block values
I don't want to iterate over each column and row if possible (my actual data is much larger than this example). It feels like there should be a simple and efficient way.
How can I do this?
One way with underlying array data -
df.columns[(df.values=='foo').any(0)].tolist()
Sample run -
In [209]: df
Out[209]:
A B C D
0 10 foo 3 some
1 20 bar 4 foo
2 42 blah 5 thing
In [210]: df.columns[(df.values=='foo').any(0)].tolist()
Out[210]: ['B', 'D']
If you are looking for just the column-mask -
In [205]: (df.values=='foo').any(0)
Out[205]: array([False, True, False, True], dtype=bool)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With