Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: String Contains and Doesn't Contain

I'm trying to match rows of a Pandas DataFrame that contains and doesn't contain certain strings. For example:

import pandas
df = pandas.Series(['ab1', 'ab2', 'b2', 'c3'])
df[df.str.contains("b")]

Output:

0    ab1
1    ab2
2     b2
dtype: object

Desired output:

2     b2
dtype: object

Question: is there an elegant way of saying something like this?

df[[df.str.contains("b")==True] and [df.str.contains("a")==False]]
# Doesn't give desired outcome
like image 729
Sam Perry Avatar asked Dec 03 '15 00:12

Sam Perry


People also ask

How do you check if a string does not contain a character in Python?

How do you check if a string does not contain a character in Python? Using Python's "in" operator The simplest and fastest way to check whether a string contains a substring or not in Python is the "in" operator . This operator returns true if the string contains the characters, otherwise, it returns false .

How do I check if a string contains a string in Python?

The easiest and most effective way to see if a string contains a substring is by using if ... in statements, which return True if the substring is detected. Alternatively, by using the find() function, it's possible to get the index that a substring starts at, or -1 if Python can't find the substring.

How do you use not contains in pandas DataFrame?

Getting rows where values do not contain substring in Pandas DataFrame. To get rows where values do not contain a substring, use str. contains(~) with the negation operator ~ .

How do I check if a string contains a substring in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .


3 Answers

You're almost there, you just haven't got the syntax quite right, it should be:

df[(df.str.contains("b") == True) & (df.str.contains("a") == False)]

Another approach which might be cleaner if you have a lot of conditions to apply would to be to chain your filters together with reduce or a loop:

from functools import reduce
filters = [("a", False), ("b", True)]
reduce(lambda df, f: df[df.str.contains(f[0]) == f[1]], filters, df)
#outputs b2
like image 131
maxymoo Avatar answered Oct 05 '22 05:10

maxymoo


You can use .loc and ~ to index:

df.loc[(df.str.contains("b")) & (~df.str.contains("a"))]

2    b2
dtype: object
like image 31
lstodd Avatar answered Oct 05 '22 03:10

lstodd


Either:

>>> ts.str.contains('b') & ~ts.str.contains('a')
0    False
1    False
2     True
3    False
dtype: bool

or use regex:

>>> ts.str.contains('^[^a]*b[^a]*$')
0    False
1    False
2     True
3    False
dtype: bool
like image 37
behzad.nouri Avatar answered Oct 05 '22 05:10

behzad.nouri