I am reading through http://docs.python.org/2/library/re.html. According to this the "r" in pythons re.compile(r' pattern flags') refers the raw string notation :
The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
Would it be fair to say then that:
re.compile(r pattern) means that "pattern" is a regex while, re.compile(pattern) means that "pattern" is an exact match?
Regular expressions, called regexes for short, are descriptions for a pattern of text. For example, a \d in a regex stands for a digit character — that is, any single numeral 0 to 9. Following regex is used in Python to match a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers.
re. compile(pattern, repl, string): We can combine a regular expression pattern into pattern objects, which can be used for pattern matching. It also helps to search a pattern again without rewriting it.
R Functions for Pattern MatchingIf the regular expression, pattern, matches a particular element in the vector string, it returns the element's index. For returning the actual matching element values, set the option value to TRUE by value=TRUE .
As @PauloBu
stated, the r
string prefix is not specifically related to regex's, but to strings generally in Python.
Normal strings use the backslash character as an escape character for special characters (like newlines):
>>> print('this is \n a test') this is a test
The r
prefix tells the interpreter not to do this:
>>> print(r'this is \n a test') this is \n a test >>>
This is important in regular expressions, as you need the backslash to make it to the re
module intact - in particular, \b
matches empty string specifically at the start and end of a word. re
expects the string \b
, however normal string interpretation '\b'
is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'
), or tell python it is a raw string (r'\b'
).
>>> import re >>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter [] >>> re.findall('\\b', 'test') # backslash is explicitly escaped and is passed through to re module ['', ''] >>> re.findall(r'\b', 'test') # often this syntax is easier ['', '']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With