Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use regular expressions to replace overlapping subpatterns

Tags:

python

regex

I have the following regular expression substitution:

input=re.sub( r"([a-zA-Z0-9])\s+([a-zA-Z0-9])" , r"\1*\2" , input )

I use the regular expression on the string "3 a 5 b".

I get back "3*a 5*b".

I am thinking I should get back "3*a*5*b".

So somehow my regular expression substitutions are interfering with each other.

What can I do to get the result I want, other than iterative runs of the regular expression?

like image 352
Richard Avatar asked Mar 08 '13 19:03

Richard


2 Answers

Use a lookahead assertion, (?=...), so as not to eat up the second pattern:

In [33]: re.sub( r"([a-zA-Z0-9])\s+(?=[a-zA-Z0-9])" , r"\1*" , '3 a 5 b')
Out[33]: '3*a*5*b'

In [32]: re.sub( r"([a-zA-Z0-9])\s+(?=[a-zA-Z0-9])" , r"\1*" , "3 /a 5! b" )
Out[32]: '3 /a*5! b'
like image 88
unutbu Avatar answered Nov 11 '22 08:11

unutbu


Regular expressions are not always the best tool for the job when using Python. For the case you describe above, Python offers a much simpler, more readable, and more maintainable method:

>>> s = "3 a 5 b"
>>> '*'.join(s.split())
'3*a*5*b'
like image 45
Jeffrey Froman Avatar answered Nov 11 '22 09:11

Jeffrey Froman