This post shows how to find the shortest overlapping match using regex. One of the answers shows how to get the shortest match, but I am struggling with how to locate the shortest match and mark its position, or substitute it with another string.
So in the given pattern,
A|B|A|F|B|C|D|E|F|G
and the pattern I want to locate is:
my_pattern = 'A.*?B.*?C'
How can I identify the shortest match and mark it in the original given pattern like below?
A|B|[A|F|B|C]|D|E|F|G
or substitute:
A|B|AAA|F|BBB|CCC|D|E|F|G
I suggest to use Tim Pietzcker's answer with re.sub
:
>>> p=re.findall(r'(?=(A.*?B.*?C))',s)
>>> re.sub(r'({})'.format(re.escape(min(p, key=len))),r'[\1]',s,re.DOTALL)
'A|B|[A|F|B|C]|D|E|F|G'
One way is to use lookahead between A
and B
and then B
and C
like this:
import re
p = re.compile(ur'A(?:(?![AC]).)*B(?:(?![AB]).)*C')
test_str = u"A|B|A|F|B|C|D|E|F|G"
result = re.sub(p, u"[$0]", test_str)
# A|B|[A|F|B|C]|D|E|F|G
test_str = u"A|B|C|F|B|C|D|E|F|G"
result = re.sub(p, u"[$0]", test_str)
# [A|B|C]|F|B|C|D|E|F|G
RegEx Demo
(A[^A]*?B[^B]*?C)
You can use this simple regex.Replace by [\1]
.
See Demo
x="A|B|A|F|B|C|D|A|B|C"
print re.sub("("+re.escape(min(re.findall(r"(A[^A]*?B[^B]*?C)",x),key=len))+")",r"[\1]",x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With