df (Pandas Dataframe) has three rows.
col_name
"This is Donald."
"His hands are so small"
"Why are his fingers so short?"
I'd like to extract the row that contains "is" and "small".
If I do
df.col_name.str.contains("is|small", case=False)
Then it catches "His" as well- which I don't want.
Is below query is the right way to catch the whole word in df.series?
df.col_name.str.contains("\bis\b|\bsmall\b", case=False)
No, the regex /bis/b|/bsmall/b
will fail because you are using /b
, not \b
which means "word boundary".
Change that and you get a match. I would recommend using
\b(is|small)\b
This regex is a little faster and a little more legible, at least to me. Remember to put it in a raw string (r"\b(is|small)\b"
) so you don’t have to escape the backslashes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With