Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make python regex which matches multiple patterns to same index

Is it possible to get all overlapping matches, which starts from the same index, but are from different matching group?

e.g. when I look for pattern "(A)|(AB)" from "ABC" regex should return following matches:

(0,"A") and (0,"AB")

like image 605
Mikael Lepistö Avatar asked May 23 '11 17:05

Mikael Lepistö


People also ask

How can I find all matches to a regular expression in Python?

findall(pattern, string) returns a list of matching strings. re. finditer(pattern, string) returns an iterator over MatchObject objects.

WHAT IS RE pattern in Python?

❮ Previous Next ❯ A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.


2 Answers

For one possibility see the answer of Evpok. The second interpretation of your question can be that you want to match all patterns at the same time from the same position. You can use a lookahead expression in this case. E.g. the regular expression

(?=(A))(?=(AB))

will give you the desired result (i.e. all places where both patterns match together with the groups).

Update: With the additional clarification this can still be done with a single regex. You just have to make both groups above optional, i.e.

(?=(A))?(?=(AB))?(?:(?:A)|(?:AB))

Nevertheless I wouldn't suggest to do so. You can much more easily look for each pattern separately and later join the results.

string = "AABAABA"
result = [(g.start(), g.group()) for g in re.compile('A').finditer(string)]
result += [(g.start(), g.group()) for g in re.compile('AB').finditer(string)]
like image 106
Howard Avatar answered Oct 14 '22 14:10

Howard


I get this though I can't recall where or from who

def myfindall(regex, seq):
    resultlist = []
    pos = 0
    while True:
        result = regex.search(seq, pos)
        if result is None:
            break
        resultlist.append(seq[result.start():result.end()])
        pos = result.start() + 1
    return resultlist

it returns a list of all (even overlapping) matches, with the limit of no more than one match for each index.

like image 40
Evpok Avatar answered Oct 14 '22 14:10

Evpok