Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mark the shortest overlapping match using regular expressions

Tags:

python

regex

This post shows how to find the shortest overlapping match using regex. One of the answers shows how to get the shortest match, but I am struggling with how to locate the shortest match and mark its position, or substitute it with another string.

So in the given pattern,

A|B|A|F|B|C|D|E|F|G

and the pattern I want to locate is:

my_pattern = 'A.*?B.*?C'

How can I identify the shortest match and mark it in the original given pattern like below?

A|B|[A|F|B|C]|D|E|F|G

or substitute:

A|B|AAA|F|BBB|CCC|D|E|F|G
like image 906
user2870222 Avatar asked Mar 22 '15 17:03

user2870222


3 Answers

I suggest to use Tim Pietzcker's answer with re.sub :

>>> p=re.findall(r'(?=(A.*?B.*?C))',s)
>>> re.sub(r'({})'.format(re.escape(min(p, key=len))),r'[\1]',s,re.DOTALL)
'A|B|[A|F|B|C]|D|E|F|G'
like image 56
Mazdak Avatar answered Oct 29 '22 18:10

Mazdak


One way is to use lookahead between A and B and then B and C like this:

import re
p = re.compile(ur'A(?:(?![AC]).)*B(?:(?![AB]).)*C')
test_str = u"A|B|A|F|B|C|D|E|F|G"
result = re.sub(p, u"[$0]", test_str)
# A|B|[A|F|B|C]|D|E|F|G

test_str = u"A|B|C|F|B|C|D|E|F|G"
result = re.sub(p, u"[$0]", test_str)
# [A|B|C]|F|B|C|D|E|F|G

RegEx Demo

like image 21
anubhava Avatar answered Oct 29 '22 16:10

anubhava


(A[^A]*?B[^B]*?C)

You can use this simple regex.Replace by [\1].

See Demo

x="A|B|A|F|B|C|D|A|B|C" print re.sub("("+re.escape(min(re.findall(r"(A[^A]*?B[^B]*?C)",x),key=len))+")",r"[\1]",x)

like image 36
vks Avatar answered Oct 29 '22 18:10

vks