example:
s = r't\s t t\\s'
print(re.findall('\s',s))
print(re.findall('\\s',s))
I found that the two statements print the same result: [' ', ' '],which indicates that \sand\\sis same in Python's string class. Actually, when I type the following code in Python's interaction interface, I got this:
>>> str1 = '\s'
>>> str1
'\\s'
So it seems that python would convert \sto \\s. Why would Python do this and what is this for? Is it the same in other languages like Java?
Actually, what I'm asking is that: In Python, if I want to match whitespace, the regex and the String I input could both be "\s", right? However, in Java, the regex should be "\s", while the String should be "\\s". The two languages seem to treat String "\s" differently. Why?
So it seems that python would convert \sto \s.
Don't confuse string representations with the actual content of the string. String representation is the way you write a string in source code, which may not exactly be the same as the string actually in memory. Backslashes are parsed specially to allow you to write non-printable characters using the backslash syntax. In this case, \s is not a valid escape sequence so the python parser interprets it literally as backslash-s. In memory, the string is still a character sequence containing the letters: `\, s
str class have a __repr__()/repr() method that returns a string that contains the source-code representation of the string, this is the string that gets printed when you don't use print statement in the REPL. This allows you to copy paste those string and reuse it in another part of the shell, but it isn't really what is stored in memory and how python interprets the string. When printing repr, python always escapes a literal backslash, this is to remove ambiguity on whether the backslash is interpreted as escape sequence or as a literal character.
Why would Python do this and what is this for? Is it the same in other languages like Java?
Most languages' string literal do interpret backslash escape sequence, although different languages treats invalid escape sequence differently. In Python, invalid backslash escape sequence is silently treated as literal backslash instead of producing an error. You'd probably encounter this kind of issue more often in Python because it has an ubiquitous repr() protocol and the default use of repr in the REPL shell.
Python is just escaping it, so when it sees an "\" continued by a letter and if that letter doesn't have any special meaning then Python actually escapes the backslash, instead of throwing any errors.
Python interactive interface uses repr to return a string containing a printable representation of an object. So that function is adding the extra backslash to indicate that it's a literal backslash.
If you use print function to show the value of str1, you will get it printed in the stdout with just 1 backslash.
Look at this example:
str1 = '\s'
print str1
print str1.__repr__()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With