Please refer to this Regular Expression HOWTO for python3
https://docs.python.org/3/howto/regex.html#performing-matches
>>> p = re.compile('\d+')
>>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
['12', '11', '10']
I have read that for regular expression containing '\'
, the raw strings should be used like r'\d+'
but in this code snippet re.compile('\d+')
is used without using the r
specifier. And it works fine. Why does it work in the first place? Why does this regular expression not need an 'r' preceding it?
Raw strings help you get the "source code" of a RegEx safely to the RegEx parser, which will then assign meaning to character sequences like \d , \w , \n , etc...
Python raw string is created by prefixing a string literal with 'r' or 'R'. Python raw string treats backslash (\) as a literal character. This is useful when we want to have a string that contains backslash and don't want it to be treated as an escape character.
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They're useful for, say, encoding text like HTML.
In Python, when you prefix a string with the letter r or R such as r'...' and R'...' , that string becomes a raw string. Unlike a regular string, a raw string treats the backslashes ( \ ) as literal characters.
It happens to work because '\d'
doesn't correspond to a special character like '\n'
or '\t'
do. Sometimes a raw string turns out the same as the regular string version. Generally, though, raw strings will ensure that you don't get any surprises in your expression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With