I'm trying to match rows of a Pandas DataFrame that contains and doesn't contain certain strings. For example:
import pandas
df = pandas.Series(['ab1', 'ab2', 'b2', 'c3'])
df[df.str.contains("b")]
Output:
0 ab1
1 ab2
2 b2
dtype: object
Desired output:
2 b2
dtype: object
Question: is there an elegant way of saying something like this?
df[[df.str.contains("b")==True] and [df.str.contains("a")==False]]
# Doesn't give desired outcome
How do you check if a string does not contain a character in Python? Using Python's "in" operator The simplest and fastest way to check whether a string contains a substring or not in Python is the "in" operator . This operator returns true if the string contains the characters, otherwise, it returns false .
The easiest and most effective way to see if a string contains a substring is by using if ... in statements, which return True if the substring is detected. Alternatively, by using the find() function, it's possible to get the index that a substring starts at, or -1 if Python can't find the substring.
Getting rows where values do not contain substring in Pandas DataFrame. To get rows where values do not contain a substring, use str. contains(~) with the negation operator ~ .
Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .
You're almost there, you just haven't got the syntax quite right, it should be:
df[(df.str.contains("b") == True) & (df.str.contains("a") == False)]
Another approach which might be cleaner if you have a lot of conditions to apply would to be to chain your filters together with reduce or a loop:
from functools import reduce
filters = [("a", False), ("b", True)]
reduce(lambda df, f: df[df.str.contains(f[0]) == f[1]], filters, df)
#outputs b2
You can use .loc and ~ to index:
df.loc[(df.str.contains("b")) & (~df.str.contains("a"))]
2 b2
dtype: object
Either:
>>> ts.str.contains('b') & ~ts.str.contains('a')
0 False
1 False
2 True
3 False
dtype: bool
or use regex:
>>> ts.str.contains('^[^a]*b[^a]*$')
0 False
1 False
2 True
3 False
dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With