For example, I have text with a lot of product dimensions like "2x4" which I'd like to convert to "2 xby 4".
pattern = r"([0-9])\s*[xX\*]\s*([0-9])"
re.sub(pattern, r"\1 xby \2", "2x4")
'2 xby 4' # good
re.sub(pattern, r"\1 xby \2", "2x4x12")
'2 xby 4x12' # not good. need this to be '2 xby 4 xby 12'
One way of describing what I want to do is repeat the replacement until no more replacements can be made. For example, I can simply to the above replacement twice to get what I want
x = re.sub(pattern, r"\1 xby \2", "2x4x12")
x = re.sub(pattern, r"\1 xby \2", x)
'2 xby 4 xby 12'
But I assume there's a better way
You can use this lookahead regex for search:
r'([0-9]+)\s*[xX*]\s*(?=[0-9]+)'
(?=[0-9]+)
is positive lookahead that just asserts the presence of second number by looking ahead but doesn't move the internal regex pointer by matching the number.
And use this for replacement:
r'\1 xby '
RegEx Demo
Code:
>>> pattern = r'([0-9]+)\s*[xX*]\s*(?=[0-9]+)'
>>> re.sub(pattern, r'\1 xby ', "2x4")
'2 xby 4'
>>> re.sub(pattern, r'\1 xby ', "2x4x12")
'2 xby 4 xby 12'
I think you can approach this with a single pass, by thinking a little differently about it. What you are really attempting to do is replace the x with xby -- so you can scan the whole string once, if you don't consume the right side of the digits.
For this, I recommend a look-ahead assertion. Basically, confirm that the thing you are replacing is followed by digits, but do not eat the digits in the process. This notation is (?=...) - see re docpage.
For me, I have the following -- note that compiling the regex is optional and \d is usually preferred to [0-9]:
pattern = re.compile(r"(\d+)\s*[xX\*]\s*(?=\d)")
pattern.sub(r"\1 xby ", "2x4x12")
'2 xby 4 xby 12'
In one pass, it will process the whole string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With