I'm attempting to select rows from a dataframe using the pandas str.contains()
function with a regular expression that contains a variable as shown below.
df = pd.DataFrame(["A test Case","Another Testing Case"], columns=list("A"))
variable = "test"
df[df["A"].str.contains(r'\b' + variable + '\b', regex=True, case=False)] #Returns nothing
While the above returns nothing, the following returns the appropriate row as expected
df[df["A"].str.contains(r'\btest\b', regex=True, case=False)] #Returns values as expected
Any help would be appreciated.
Python string __contains__() is an instance method and returns boolean value True or False depending on whether the string object contains the specified string object or not. Note that the Python string contains() method is case sensitive.
str. contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
The any() method returns one value for each column, True if ANY value in that column is True, otherwise False. By specifying the column axis ( axis='columns' ), the all() method returns True if ANY value in that axis is True.
Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .
Both word boundary characters must be inside raw strings. Why not use some sort of string formatting instead? String concatenation as a rule is generally discouraged.
df[df["A"].str.contains(fr'\b{variable}\b', regex=True, case=False)]
# Or,
# df[df["A"].str.contains(r'\b{}\b'.format(variable), regex=True, case=False)]
A
0 A test Case
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With