Can you please help me understand this behaviour:
>>> a = "abc\\def\\ghi"
>>> a.split(r"\\")
['abc\\def\\ghi']
However, after spending a few minutes and permutations, I found this to be working for now:
>>> a.split("\\")
['abc', 'def', 'ghi']
Can you point me to the literature/design-considerations that results in this behaviour?
Your string contains regular, single backslashes which have been escaped:
>>> a = "abc\\def\\ghi"
>>> a
'abc\\def\\ghi'
>>> print(a)
abc\def\ghi
When you split by "\\" you are escaping one backslash, so you are splitting by one backslash and will get a list of three elements: ['abc', 'def', 'ghi'].
When you split by r"\\" you are splitting by two backslashes, because prefixing a string with r is Python's raw string notation (which has nothing to do with regexes). The important thing here is that backslashes are not handled in any special way in a raw string literal.
The reason you often see strings prefixed with r when you are looking at people's regex is that they do not want to escape backslash characters which also have a special meaning in regular expressions.
Some further reading with regards to regular expressions: The Backslash Plague
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With