Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do Python regex strings sometimes work without using raw strings?

Python recommends using raw strings when defining regular expressions in the re module. From the Python documentation:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal.

However, in many cases this is not necessary, and you get the same result whether you use a raw string or not:

$ ipython

In [1]: import re

In [2]: m = re.search("\s(\d)\s", "a 3 c")

In [3]: m.groups()
Out[3]: ('3',)

In [4]: m = re.search(r"\s(\d)\s", "a 3 c")

In [5]: m.groups()
Out[5]: ('3',)

Yet, in some cases this is not the case:

In [6]: m = re.search("\s(.)\1\s", "a 33 c")

In [7]: m.groups()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-84a8d9c174e2> in <module>()
----> 1 m.groups()

AttributeError: 'NoneType' object has no attribute 'groups'

In [8]: m = re.search(r"\s(.)\1\s", "a 33 c")

In [9]: m.groups()
Out[9]: ('3',)

And you must escape the special characters when not using a raw string:

In [10]: m = re.search("\\s(.)\\1\\s", "a 33 c")

In [11]: m.groups()
Out[11]: ('3',)

My question is why do the non-escaped, non-raw regex strings work at all with special characters (as in command [2] above)?

like image 774
Fiver Avatar asked Feb 05 '15 01:02

Fiver


1 Answers

The example above works because \s and \d are not escape sequences in python. According to the docs:

Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. 

But it's best to just use raw strings and not worry about what is or isn't a python escape, or worry about changing it later if you change the regex.

like image 146
Josie McClellan Avatar answered Sep 22 '22 22:09

Josie McClellan