Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaping regex string

Tags:

python

regex

I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?

For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with \( and the ) with \) but the problem is I will need to do replace for every possible regex symbol.

Do you know some better way ?

like image 782
MichaelT Avatar asked Nov 11 '08 09:11

MichaelT


People also ask

What does escape mean in regex?

Now, escaping a string (in regex terms) means finding all of the characters with special meaning and putting a backslash in front of them, including in front of other backslash characters. When you've done this one time on the string, you have officially "escaped the string".

How do I escape a character in a string?

To insert characters that are illegal in a string, use an escape character. An escape character is a backslash \ followed by the character you want to insert.

How do you escape a forward slash in regex?

We should double for a backslash escape a forward slash / in a regular expression. A backslash \ is used to denote character classes, e.g. \d . So it's a special character in regular expression (just like in regular strings).


3 Answers

Use the re.escape() function for this:

4.2.3 re Module Contents

escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.

def simplistic_plural(word, text):
    word_or_plural = re.escape(word) + 's?'
    return re.match(word_or_plural, text)
like image 171
ddaa Avatar answered Oct 04 '22 16:10

ddaa


You can use re.escape():

re.escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'

If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.

If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).

like image 38
gimel Avatar answered Oct 04 '22 17:10

gimel


Unfortunately, re.escape() is not suited for the replacement string:

>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'

A solution is to put the replacement in a lambda:

>>> re.sub('a', lambda _: '_', 'aa')
'__'

because the return value of the lambda is treated by re.sub() as a literal string.

like image 39
Owen Avatar answered Oct 04 '22 16:10

Owen