Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas.Series.str.contains WHOLE WORD

df (Pandas Dataframe) has three rows.

col_name
"This is Donald."
"His hands are so small"
"Why are his fingers so short?"

I'd like to extract the row that contains "is" and "small".

If I do

df.col_name.str.contains("is|small", case=False)

Then it catches "His" as well- which I don't want.

Is below query is the right way to catch the whole word in df.series?

df.col_name.str.contains("\bis\b|\bsmall\b", case=False)
like image 481
aerin Avatar asked Sep 07 '16 00:09

aerin


1 Answers

No, the regex /bis/b|/bsmall/b will fail because you are using /b, not \b which means "word boundary".

Change that and you get a match. I would recommend using

\b(is|small)\b

This regex is a little faster and a little more legible, at least to me. Remember to put it in a raw string (r"\b(is|small)\b") so you don’t have to escape the backslashes.

like image 122
Laurel Avatar answered Oct 12 '22 12:10

Laurel