I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex? For example, the user wants to search for Word <code>(s)</code>: regex engine will take the <code>(s)</code> as a group. I want it to treat it like a string <code>"(s)" </code>. I can run <code>replace</code> on user input and replace the <code>(</code> with <code>$</code> and the <code>)</code> with <code>$</code> but the problem is I will need to do replace for every possible regex symbol. Do you know some better way ?

Use the <code>re.escape()</code> function for this: 4.2.3 <code>re</code> Module Contents <blockquote> escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it. </blockquote> A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object. <pre class="prettyprint"><code>def simplistic_plural(word, text): word_or_plural = re.escape(word) + 's?' return re.match(word_or_plural, text) </code></pre>

You can use <code>re.escape()</code>: <blockquote> re.escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it. </blockquote> <pre class="prettyprint"><code>>>> import re >>> re.escape('^a.*$') '\\^a\\.\\*\\$' </code></pre> If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well. If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (<code>_</code>).

Unfortunately, <code>re.escape()</code> is not suited for the replacement string: <pre class="prettyprint"><code>>>> re.sub('a', re.escape('_'), 'aa') '\\_\\_' </code></pre> A solution is to put the replacement in a lambda: <pre class="prettyprint"><code>>>> re.sub('a', lambda _: '_', 'aa') '__' </code></pre> because the return value of the lambda is treated by <code>re.sub()</code> as a literal string.

Escaping regex string

Tags:

python

regex

I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?

For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with $ and the ) with $ but the problem is I will need to do replace for every possible regex symbol.

Do you know some better way ?

782

asked Nov 11 '08 09:11

MichaelT

3 Answers

Use the re.escape() function for this:

4.2.3 re Module Contents

escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.

def simplistic_plural(word, text):
    word_or_plural = re.escape(word) + 's?'
    return re.match(word_or_plural, text)

171

answered Oct 04 '22 16:10

ddaa

You can use re.escape():

re.escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'

If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.

If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).

answered Oct 04 '22 17:10

gimel

Unfortunately, re.escape() is not suited for the replacement string:

>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'

A solution is to put the replacement in a lambda:

>>> re.sub('a', lambda _: '_', 'aa')
'__'

because the return value of the lambda is treated by re.sub() as a literal string.

answered Oct 04 '22 16:10

Owen

Related questions
                            
                                Using global variables between files?
                            
                                Add column to dataframe with constant value
                            
                                Convert a Pandas DataFrame to a dictionary
                            
                                How can I obtain the element-wise logical NOT of a pandas Series?
                            
                                Django gives Bad Request (400) when DEBUG = False
                            
                                How to avoid .pyc files?
                            
                                Python Requests - No connection adapters
                            
                                How to remove specific elements in a numpy array
                            
                                Round number to nearest integer
                            
                                How do I loop through a list by twos? [duplicate]
                            
                                Read file from line 2 or skip header row
                            
                                How do I fix 'ImportError: cannot import name IncompleteRead'?
                            
                                Virtualenv Command Not Found
                            
                                How to fix Python indentation
                            
                                How to append multiple values to a list in Python
                            
                                How do you extract a column from a multi-dimensional array?
                            
                                List comprehension on a nested list?
                            
                                Defining private module functions in python
                            
                                How can I find where Python is installed on Windows?
                            
                                Why are some float < integer comparisons four times slower than others?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With