Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter a pandas dataframe by cells that DO NOT contain a substring?

Tags:

python

pandas

I want to filter a dataframe to find rows which do not contain the string 'site'.

I know how to filter for rows which do contain 'site' but have not been able to get the reverse working. Here is what I have so far:

def rbs(): #removes blocked sites
    frame = fill_rate()
    mask = frame[frame['Media'].str.contains('Site')==True]
    frame = (frame != mask)
    return frame

But this returns an error, of course.

like image 767
bpr Avatar asked Jun 11 '15 20:06

bpr


People also ask

How do I check if a string contains a substring panda?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .

How do you use not contains in pandas DataFrame?

Getting rows where values do not contain substring in Pandas DataFrame. To get rows where values do not contain a substring, use str. contains(~) with the negation operator ~ .


1 Answers

Just do frame[~frame['Media'].str.contains('Site')]

The ~ negates the boolean condition

So your method becomes:

def rbs(): #removes blocked sites
    frame = fill_rate()
    return frame[~frame['Media'].str.contains('Site')]

EDIT

it looks like you have NaN values judging by your errors so you have to filter these out first so your method becomes:

def rbs(): #removes blocked sites
    frame = fill_rate()
    frame = frame[frame['Media'].notnull()]
    return frame[~frame['Media'].str.contains('Site')]

the notnull will filter out the missing values

like image 190
EdChum Avatar answered Sep 28 '22 16:09

EdChum