Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion escaping single quotes in a single-quoted raw string literal

The following works as expected:

>>> print re.sub('(\w)"(\W)', r"\1''\2", 'The "raw string literal" is a special case of a "string literal".')
The "raw string literal'' is a special case of a "string literal''.

Since I wanted to use single quotes in the replacement expression (is that the correct terminology?), I quoted it using double quotes.

But then for my edification I tried using single quotes in the replacement expression and can't understand the results:

>>> print re.sub('(\w)"(\W)', r'\1\'\'\2', 'The "raw string literal" is a special case of a "string literal".')
The "raw string literal\'\' is a special case of a "string literal\'\'.

Shouldn't the two forms produce exactly the same output?

So, my questions are:

  1. How do I escape a single quote in a single-quoted raw string?
  2. How do I escape a double quote in a double-quoted raw string?
  3. Why is it that in the first parameter to re.sub() I didn't have to use raw string, but in the second parameter I have to. Both seem like string representations of regexes to this Python noob.

If it makes a difference, am using Python 2.7.5 on Mac OS X (10.9, Mavericks).

like image 781
markvgti Avatar asked Feb 15 '23 15:02

markvgti


2 Answers

No, they should not. A raw string literal does let you escape quotes, but the backslashes will be included:

>>> r"\'"
"\\'"

where Python echoes the resulting string as a string literal with the backslash escaped.

This is explicitly documented behaviour of the raw string literal syntax:

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes).

If you didn't use a raw string literal for the second parameter, Python would interpret the \digit combination as octal byte values:

>>> '\0'
'\x00'

You can construct the same string without raw string literals with doubling the backslash:

>>> '\\1\'\'\\2'
"\\1''\\2"
like image 129
Martijn Pieters Avatar answered Feb 17 '23 10:02

Martijn Pieters


To answer the questions of the OP:

How do I escape a single quote in a single-quoted raw string?

That is not possible, except if you have the special case where the single quote is preceded by a backslash (as Martijn pointed out).

How do I escape a double quote in a double-quoted raw string?

See above.

Why is it that in the first parameter to re.sub() I didn't have to use raw string, but in the second parameter I have to. Both seem like string representations of regexes to this Python noob.

Completing Martijn's answer (which only covered the second parameter): The backslashes in the first parameter are attempted to be interpreted as escape characters together with their following characters, because the string is not raw. However, because the following characters do not happen to form valid escape sequences together with a backslash, the backslash is interpreted as a character:

>>> '(\w)"(\W)'
'(\\w)"(\\W)'
>>> '(\t)"(\W)'
'(\t)"(\\W)'
like image 24
Andreas Maier Avatar answered Feb 17 '23 12:02

Andreas Maier