Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Invalid group reference in python 2.7+

I am trying to convert all WikiLink type of strings in my webpage(created in django) to html links.

I am using the following expression

import re
expr = r'\s+[A-Z][a-z]+[A-Z][a-z]+\s'
repl=r'<a href="/photos/\1">\1</a>'
mystr = 'this is a string to Test whether WikiLink will work ProPerly'

parser=re.compile(expr)
parser.sub(repl, mystr)

This returns me the following string with hex value replaced for the string.

"this is a string to Test whether<a href='/mywiki/\x01>\x01</a>'will work<a href='/mywiki/\x01>\x01</a>'"

Looking at the python help for re.sub, I tried changing \1 to \g<1> but that results in a invalid group reference error.

Please help me understand how to get this working

like image 714
Guru Govindan Avatar asked Nov 29 '12 23:11

Guru Govindan


1 Answers

The problem here is that you don't have any captured groups in the expr.

Whatever part of the match you want to show up as \1, you need to put in parentheses. For example:

>>> expr = r'\s+([A-Z][a-z]+[A-Z][a-z]+)\s'
>>> parser=re.compile(expr)
>>> parser.sub(repl, mystr)
'this is a string to Test whether<a href="/photos/WikiLink">WikiLink</a>will work ProPerly'

The backreference \1 refers to the group 1 within the match, which is the part that matched the first parenthesized subexpression. Likewise, \2 is group 2, the part that matched the second parenthesized subexpression, and so on. If you use \1 when you have fewer than 1 group, some regexp engines will give you an error, others will use a literal '\1' character, a ctrl-A; Python does the latter, and the canonical representation of ctrl-A is '\x01', so that's why you see it that way.

Group 0 is the entire match. But that's not what you want in this case, because you don't want the spaces to be part of the substitution.

The only reason you need the g syntax is when a simple backreference is ambiguous. For example, if sub were 123\1456, there's no way to tell whether that means 123, followed by group 1, followed by 456, or 123 followed by group 1456, or…

Further reading on grouping and backreferences.

like image 156
abarnert Avatar answered Sep 30 '22 11:09

abarnert