Python regex - r prefix

Tags:

Can anyone explain why example 1 below works, when the r prefix is not used? I thought the r prefix must be used whenever escape sequences are used. Example 2 and example 3 demonstrate this.

# example 1 import re print (re.sub('\s+', ' ', 'hello     there      there')) # prints 'hello there there' - not expected as r prefix is not used  # example 2 import re print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello     there      there')) # prints 'hello     there' - as expected as r prefix is used  # example 3 import re print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello     there      there')) # prints 'hello     there      there' - as expected as r prefix is not used

983

asked Feb 11 '10 01:02

2 Answers

Because \ begin escape sequences only when they are valid escape sequences.

>>> '\n' '\n' >>> r'\n' '\\n' >>> print '\n'   >>> print r'\n' \n >>> '\s' '\\s' >>> r'\s' '\\s' >>> print '\s' \s >>> print r'\s' \s

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:

Escape Sequence   Meaning Notes \newline  Ignored   \\    Backslash (\)     \'    Single quote (')      \"    Double quote (")      \a    ASCII Bell (BEL)      \b    ASCII Backspace (BS)      \f    ASCII Formfeed (FF)   \n    ASCII Linefeed (LF)   \N{name}  Character named name in the Unicode database (Unicode only)   \r    ASCII Carriage Return (CR)    \t    ASCII Horizontal Tab (TAB)    \uxxxx    Character with 16-bit hex value xxxx (Unicode only)  \Uxxxxxxxx    Character with 32-bit hex value xxxxxxxx (Unicode only)  \v    ASCII Vertical Tab (VT)   \ooo  Character with octal value ooo \xhh  Character with hex value hh

Never rely on raw strings for path literals, as raw strings have some rather peculiar inner workings, known to have bitten people in the ass:

When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase "n". String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

To better illustrate this last point:

>>> r'\' SyntaxError: EOL while scanning string literal >>> r'\'' "\\'" >>> '\' SyntaxError: EOL while scanning string literal >>> '\'' "'" >>>  >>> r'\\' '\\\\' >>> '\\' '\\' >>> print r'\\' \\ >>> print r'\' SyntaxError: EOL while scanning string literal >>> print '\\' \

answered Oct 09 '22 03:10

Esteban Küber

the 'r' means the the following is a "raw string", ie. backslash characters are treated literally instead of signifying special treatment of the following character.

http://docs.python.org/reference/lexical_analysis.html#literals

so '\n' is a single newline
and r'\n' is two characters - a backslash and the letter 'n'
another way to write it would be '\\n' because the first backslash escapes the second

an equivalent way of writing this

print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello     there      there'))

print (re.sub('(\\b\\w+)(\\s+\\1\\b)+', '\\1', 'hello     there      there'))

Because of the way Python treats characters that are not valid escape characters, not all of those double backslashes are necessary - eg '\s'=='\\s' however the same is not true for '\b' and '\\b'. My preference is to be explicit and double all the backslashes.

answered Oct 09 '22 04:10

John La Rooy

Related questions
                            
                                Extract images from PDF without resampling, in python?
                            
                                How to draw a rectangle around a region of interest in python
                            
                                Downloading and unzipping a .zip file without writing to disk
                            
                                One liner: creating a dictionary from list with indices as keys
                            
                                Joining multiple strings if they are not empty in Python
                            
                                How can I remove the ANSI escape sequences from a string in python
                            
                                Django CSRF Cookie Not Set
                            
                                "python" not recognized as a command
                            
                                Installing lxml module in python
                            
                                How to implement virtual methods in Python?
                            
                                Efficiently generate a 16-character, alphanumeric string
                            
                                Why is '+' not understood by Python sets?
                            
                                How to get the difference between two dictionaries in Python?
                            
                                Understanding min_df and max_df in scikit CountVectorizer
                            
                                Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)
                            
                                Public free web services for testing soap client [closed]
                            
                                Why are assertEquals() parameters in the order (expected, actual)?
                            
                                WhatsApp API (java/python) [closed]
                            
                                What is the role of TimeDistributed layer in Keras?
                            
                                Add numpy array as column to Pandas data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python regex - r prefix

Tags:

python

string

regex

literals

prefix

JT.

People also ask

2 Answers

Esteban Küber

John La Rooy

Recent Activity

Donate For Us