My DataFrame has two columns:
Name Status
a I am Good
b Goodness!!!
c Good is what i feel
d Not Good-at-all
I want to filter rows in which Status has a string 'Good' as its exact word, not combined with any other words or characters.
So the output will be:
Name Status
a i am Good
c Good is what i feel
Two other rows had a 'Good' string in it but mixed with other characters, so should not be picked up.
I tried doing:
d = df[df['Status'].str.contains('Good')] # But all rows come up
I believe some regex like (r'\bGood\b', Status)
will do that, but this is not able to sum it up together. And how/where exactly can I fit the regex in a DataFrame filter condition to achieve this? And how to achieve startswith
or endswith
'Good' (exact word search)?
If you're defining "exact" to mean no other characters (including punctuation which defines a word boundary \b
), you could instead check for a leading and trailing space and/or beginning/end anchors:
>>> df[df['Status'].str.contains(r'(?:\s|^)Good(?:\s|$)')]
Name Status
0 a I am Good
2 c Good is what i feel
Explanation:
(?:\s|^)
is a non-capturing group looking for a space character (\s
) or the beginning of the string (^
).
Good
is the word you're looking for.
(?:\s|$)
is a non-capturing group looking for a space character (\s
) or the end of the string ($
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With