Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace all occurrences of regex as if applying replace repeatedly

Tags:

python

regex

For example, I have text with a lot of product dimensions like "2x4" which I'd like to convert to "2 xby 4".

pattern = r"([0-9])\s*[xX\*]\s*([0-9])"

re.sub(pattern, r"\1 xby \2", "2x4")
'2 xby 4' # good

re.sub(pattern, r"\1 xby \2", "2x4x12")
'2 xby 4x12' # not good. need this to be '2 xby 4 xby 12'

One way of describing what I want to do is repeat the replacement until no more replacements can be made. For example, I can simply to the above replacement twice to get what I want

x = re.sub(pattern, r"\1 xby \2", "2x4x12")
x = re.sub(pattern, r"\1 xby \2", x)
'2 xby 4 xby 12'

But I assume there's a better way

like image 653
Ben Avatar asked Mar 11 '16 19:03

Ben


2 Answers

You can use this lookahead regex for search:

r'([0-9]+)\s*[xX*]\s*(?=[0-9]+)'

(?=[0-9]+) is positive lookahead that just asserts the presence of second number by looking ahead but doesn't move the internal regex pointer by matching the number.

And use this for replacement:

r'\1 xby '

RegEx Demo

Code:

>>> pattern = r'([0-9]+)\s*[xX*]\s*(?=[0-9]+)'

>>> re.sub(pattern, r'\1 xby ', "2x4")
'2 xby 4'

>>> re.sub(pattern, r'\1 xby ', "2x4x12")
'2 xby 4 xby 12'
like image 194
anubhava Avatar answered Oct 17 '22 14:10

anubhava


I think you can approach this with a single pass, by thinking a little differently about it. What you are really attempting to do is replace the x with xby -- so you can scan the whole string once, if you don't consume the right side of the digits.

For this, I recommend a look-ahead assertion. Basically, confirm that the thing you are replacing is followed by digits, but do not eat the digits in the process. This notation is (?=...) - see re docpage.

For me, I have the following -- note that compiling the regex is optional and \d is usually preferred to [0-9]:

pattern = re.compile(r"(\d+)\s*[xX\*]\s*(?=\d)")
pattern.sub(r"\1 xby ", "2x4x12")

'2 xby 4 xby 12'

In one pass, it will process the whole string.

like image 1
F1Rumors Avatar answered Oct 17 '22 15:10

F1Rumors