Python regex
umber adding \0x

Question

I am tring to go a simple regex replace on a string in python. This is my code:

>>> s = "num1 1 num2 5"
>>> re.sub("num1 (.*?) num2 (.*?)","1 \1 2 \2",s)

I would expect an output like this, with the umbers being replaced with their corresponding groups.

'1 1 2 5'

However, this is the output I am getting:

'1 \x01 2 \x025'

And I'm kinda stumped as to why the \x0s are their, and not what I would like to be there. Many thanks for any help

Nolen Royalty · Accepted Answer

You need to start using raw strings (prefix the string with r):

>>> import re
>>> s = "num1 1 num2 5"
>>> re.sub(r"num1 (.*?) num2 (.*?)", r"1 \1 2 \2", s)
'1 1 2 5'

Otherwise you would need to escape your backslashes both for python and for the regex, like this:

>>> re.sub("num1 (.*?) num2 (.*?)", "1 \1 2 \2", s)
'1 1 2 5'

(this gets really old really fast, check out the opening paragraphs of the python regex docs

Amber · Answer

\1 and \2 are getting interpreted as octal character code escapes, rather than just getting passed to the regex engine. Using raw strings r"\1" instead of "\1" prevents this interpretation.

>>> "\17"
'\x0f'
>>> r"\17"
'\17'

Python regex \number adding \0x

Tags:

python

regex

replace

ACarter

2 Answers

Nolen Royalty

Amber

Recent Activity

Donate For Us