What is the most concise way to select all rows where any column contains a string in a Pandas dataframe?
For example, given the following dataframe what is the best way to select those rows where the value in any column contains a b
?
df = pd.DataFrame({
'x': ['foo', 'foo', 'bar'],
'y': ['foo', 'foo', 'foo'],
'z': ['foo', 'baz', 'foo']
})
I'm inexperienced with Pandas and the best I've come up with so far is the rather cumbersome df[df.apply(lambda r: r.str.contains('b').any(), axis=1)]
. Is there a simpler solution?
Critically, I want to check for a match in any columns, not a particular column. Other similar questions, as best I can tell, only address a single or list of columns.
You can select rows from a list of Index in pandas DataFrame either using DataFrame. iloc[] , DataFrame. loc[df. index[]] .
This question was not given an answer.. but the question itself and the comments has got the answer already which worked really well for me.. and I didn't find the answer anywhereelse I looked.
So I just copy pasted the answer for someone who can find it useful. I added case=False for a case insensitive serach
Solution from @Reason:
the best I've come up with so far is the rather cumbersome
this one worked for me.
df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)]
Solution from @rbinnun:
this one worked for me for a test dataset.. but for some real data set.. it returned a unicode error as below, but generally a good solution too I think
df[df.apply(lambda row: row.astype(str).str.contains('b', case=False).any(), axis=1)]
takes care of non-string columns, nans, etc.
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 5: ordinal not in range(128)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With