Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter dataframe rows containing a set of string in python

I have a dataframe df like -

A      B
12     A cat
24     The dog
54     An elephant

I have to filter rows based on values on column B containing a list of string. I can do that for a string "cat" as follows:

df[df["B"].str.contains("cat", case=False, na=False)]

This will return me

A      B
12     A cat

But now I want to filter it for a list of string i.e. ['cat', 'dog',.....].

A      B
12     A cat
24     The dog

I can do that using a for loop but am searching for a pandas way of doing this. I am using python3 and pandas and have searched a lot of solutions on stack overflow since past 2 days

like image 629
Akash Kumar Avatar asked Mar 21 '26 21:03

Akash Kumar


1 Answers

Use join with | for regex OR with \b for word boundary:

L = ['cat', 'dog']
pat = r'(\b{}\b)'.format('|'.join(L))
df[df["B"].str.contains(pat, case=False, na=False)]
like image 200
jezrael Avatar answered Mar 23 '26 09:03

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!