Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reversing Python's re.escape

How to reverse re.escape? This blog from 2007 says there is no reverse function, but is that still true, ten years later?

Python 2's decode('string_escape') doesn't work on all escaped chars (such as space).

>>> re.escape(' ')
'\\ '
>>> re.escape(' ').decode('string-escape')
'\\ '

Python 3: Some suggest unicode_escape or codec.escape_decode or ast.literal_eval but no luck with spaces.

>>> re.escape(b' ')
b'\\ '
>>> re.escape(b' ').decode('unicode_escape')
'\\ '
>>> codecs.escape_decode(re.escape(b' '))
(b'\\ ', 2)
>>> ast.literal_eval(re.escape(b' '))
ValueError: malformed node or string: b'\\ '

So is this really the only thing that works?

>>> re.sub(r'\\(.)', r'\1', re.escape(' '))
' '
like image 390
Willem Avatar asked Apr 27 '17 15:04

Willem


People also ask

What is r in regex Python?

The 'r' at the start of the pattern string designates a python "raw" string which passes through backslashes without change which is very handy for regular expressions (Java needs this feature badly!). I recommend that you always write pattern strings with the 'r' just as a habit.

What is re in Python?

Regular Expression Syntax. A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression (or if a given regular expression matches a particular string, which comes down to the same thing).

How do you match a string in Python?

String Equals Check in Python In python programming we can check whether strings are equal or not using the “==” or by using the “. __eq__” function. Example: s1 = 'String' s2 = 'String' s3 = 'string' # case sensitive equals check if s1 == s2: print('s1 and s2 are equal.


1 Answers

So is this really the only thing that works?

>>> re.sub(r'\\(.)', r'\1', re.escape(' '))
' '

Yes. The source for the re module contains no unescape() function, so you're definitely going to have to write one yourself.

Furthermore, the re.escape() function uses str.translate()

def escape(pattern):
    """
    Escape special characters in a string.
    """
    if isinstance(pattern, str):
        return pattern.translate(_special_chars_map)
    else:
        pattern = str(pattern, 'latin1')
        return pattern.translate(_special_chars_map).encode('latin1')

… which, while it can transform a single character into multiple characters (e.g. [\[), cannot perform the reverse of that operation.

Since there's no direct reversal of escape() available via str.translate(), a custom unescape() function using re.sub(), as described in your question, is the most straightforward solution.

like image 80
Zero Piraeus Avatar answered Oct 20 '22 17:10

Zero Piraeus