I am tring to go a simple regex replace on a string in python. This is my code:
>>> s = "num1 1 num2 5"
>>> re.sub("num1 (.*?) num2 (.*?)","1 \1 2 \2",s)
I would expect an output like this, with the \numbers being replaced with their corresponding groups.
'1 1 2 5'
However, this is the output I am getting:
'1 \x01 2 \x025'
And I'm kinda stumped as to why the \x0s are their, and not what I would like to be there. Many thanks for any help
You need to start using raw strings (prefix the string with r):
>>> import re
>>> s = "num1 1 num2 5"
>>> re.sub(r"num1 (.*?) num2 (.*?)", r"1 \1 2 \2", s)
'1 1 2 5'
Otherwise you would need to escape your backslashes both for python and for the regex, like this:
>>> re.sub("num1 (.*?) num2 (.*?)", "1 \\1 2 \\2", s)
'1 1 2 5'
(this gets really old really fast, check out the opening paragraphs of the python regex docs
\1 and \2 are getting interpreted as octal character code escapes, rather than just getting passed to the regex engine. Using raw strings r"\1" instead of "\1" prevents this interpretation.
>>> "\17"
'\x0f'
>>> r"\17"
'\\17'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With