Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regexp: get all group's sequence

Tags:

python

regex

I have a regex like this '^(a|ab|1|2)+$' and want to get all sequence for this...

for example for re.search(reg, 'ab1') I want to get ('ab','1')

Equivalent result I can get with '^(a|ab|1|2)(a|ab|1|2)$' pattern, but I don't know how many blocks been matched with (pattern)+

Is this possible, and if yes - how?

like image 381
vp_arth Avatar asked Aug 04 '13 17:08

vp_arth


3 Answers

try this:

import re
r = re.compile('(ab|a|1|2)')
for i in r.findall('ab1'):
    print i

The ab option has been moved to be first, so it will match ab in favor of just a. findall method matches your regular expression more times and returns a list of matched groups. In this simple example you'll get back just a list of strings. Each string for one match. If you had more groups you'll get back a list of tuples each containing strings for each group.

This should work for your second example:

pattern = '(7325189|7325|9087|087|18)'
str = '7325189087'
res = re.compile(pattern).findall(str)
print(pattern, str, res, [i for i in res])

I'm removing the ^$ signs from the pattern because if findall has to find more than one substring, then it should search anywhere in str. Then I've removed + so it matches single occurences of those options in pattern.

like image 181
nio Avatar answered Nov 05 '22 19:11

nio


Your original expression does match the way you want to, it just matches the entire string and doesn't capture individual groups for each separate match. Using a repetition operator ('+', '*', '{m,n}'), the group gets overwritten each time, and only the final match is saved. This is alluded to in the documentation:

If a group matches multiple times, only the last match is accessible.

like image 28
ebenpack Avatar answered Nov 05 '22 18:11

ebenpack


I think you don't need regexpes for this problem, you need some recursial graph search function

like image 40
wedem Avatar answered Nov 05 '22 17:11

wedem