Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a variable within a regular expression in Pandas str.contains()

I'm attempting to select rows from a dataframe using the pandas str.contains() function with a regular expression that contains a variable as shown below.

df = pd.DataFrame(["A test Case","Another Testing Case"], columns=list("A"))
variable = "test"
df[df["A"].str.contains(r'\b' + variable + '\b', regex=True, case=False)] #Returns nothing

While the above returns nothing, the following returns the appropriate row as expected

df[df["A"].str.contains(r'\btest\b', regex=True, case=False)] #Returns values as expected

Any help would be appreciated.

like image 285
neanderslob Avatar asked Dec 04 '18 22:12

neanderslob


People also ask

Is there a Contains () function in Python?

Python string __contains__() is an instance method and returns boolean value True or False depending on whether the string object contains the specified string object or not. Note that the Python string contains() method is case sensitive.

How do I use contains in pandas Python?

str. contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

What does any () do in pandas?

The any() method returns one value for each column, True if ANY value in that column is True, otherwise False. By specifying the column axis ( axis='columns' ), the all() method returns True if ANY value in that axis is True.

How do I check if a string contains a substring panda?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .


1 Answers

Both word boundary characters must be inside raw strings. Why not use some sort of string formatting instead? String concatenation as a rule is generally discouraged.

df[df["A"].str.contains(fr'\b{variable}\b', regex=True, case=False)] 
# Or, 
# df[df["A"].str.contains(r'\b{}\b'.format(variable), regex=True, case=False)] 

             A
0  A test Case
like image 186
cs95 Avatar answered Oct 29 '22 21:10

cs95