Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a variable inside a regular expression?

I'd like to use a variable inside a regex, how can I do this in Python?

TEXTO = sys.argv[1]  if re.search(r"\b(?=\w)TEXTO\b(?!\w)", subject, re.IGNORECASE):     # Successful match else:     # Match attempt failed 
like image 531
Pedro Lobito Avatar asked Aug 03 '11 17:08

Pedro Lobito


People also ask

What does $1 do in regex?

For example, the replacement pattern $1 indicates that the matched substring is to be replaced by the first captured group.

What does \\ mean in regex?

\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \.


2 Answers

You have to build the regex as a string:

TEXTO = sys.argv[1] my_regex = r"\b(?=\w)" + re.escape(TEXTO) + r"\b(?!\w)"  if re.search(my_regex, subject, re.IGNORECASE):     etc. 

Note the use of re.escape so that if your text has special characters, they won't be interpreted as such.

like image 72
Ned Batchelder Avatar answered Sep 25 '22 08:09

Ned Batchelder


From python 3.6 on you can also use Literal String Interpolation, "f-strings". In your particular case the solution would be:

if re.search(rf"\b(?=\w){TEXTO}\b(?!\w)", subject, re.IGNORECASE):     ...do something 

EDIT:

Since there have been some questions in the comment on how to deal with special characters I'd like to extend my answer:

raw strings ('r'):

One of the main concepts you have to understand when dealing with special characters in regular expressions is to distinguish between string literals and the regular expression itself. It is very well explained here:

In short:

Let's say instead of finding a word boundary \b after TEXTO you want to match the string \boundary. The you have to write:

TEXTO = "Var" subject = r"Var\boundary"  if re.search(rf"\b(?=\w){TEXTO}\\boundary(?!\w)", subject, re.IGNORECASE):     print("match") 

This only works because we are using a raw-string (the regex is preceded by 'r'), otherwise we must write "\\\\boundary" in the regex (four backslashes). Additionally, without '\r', \b' would not converted to a word boundary anymore but to a backspace!

re.escape:

Basically puts a backspace in front of any special character. Hence, if you expect a special character in TEXTO, you need to write:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\b(?!\w)", subject, re.IGNORECASE):     print("match") 

NOTE: For any version >= python 3.7: !, ", %, ', ,, /, :, ;, <, =, >, @, and ` are not escaped. Only special characters with meaning in a regex are still escaped. _ is not escaped since Python 3.3.(s. here)

Curly braces:

If you want to use quantifiers within the regular expression using f-strings, you have to use double curly braces. Let's say you want to match TEXTO followed by exactly 2 digits:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\d{{2}}\b(?!\w)", subject, re.IGNORECASE):     print("match") 
like image 20
airborne Avatar answered Sep 23 '22 08:09

airborne