Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why \g<0> behaves differently than \0 in re.sub?

Tags:

python

regex

I'm using Python 3.3

re.sub("(.)(.)",r"\2\1\g<0>","ab")  returns baab

BUT

re.sub("(.)(.)",r"\2\1\0","ab")  returns ba

Is this a bug in the sub method or does the sub method not recognize \0 on purpose for some reason?

like image 339
Harry Spier Avatar asked Jan 09 '14 19:01

Harry Spier


1 Answers

As written on this page, the \0 is interpreted as the null character (\x00) and group number start at 1 in Python (according to the re module documentation):

\number

Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55 55', but not 'thethe' (note the space after the group). This special sequence can only be used to match one of the first 99 groups. If the first digit of number is 0, or number is 3 octal digits long, it will not be interpreted as a group match, but as the character with octal value number. Inside the '[' and ']' of a character class, all numeric escapes are treated as characters.

Also, according to the page previously linked, it's not a bug but a desired behaviour (this is obvious, since it's documented).

like image 134
Maxime Lorant Avatar answered Sep 18 '22 20:09

Maxime Lorant