Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can I get all possible subgroups in python regex?

I would like to get all possible subgroups during regex findall: (group(subgroup))+. Currently it only returns the last matches, for example:

>>> re.findall(r'SOME_STRING_(([A-D])[0-9]+)+_[A-Z]+', 'SOME_STRING_A2B2C3_OTK')
[('C3', 'C')]

Now I have to do that in two steps:

>>> match = re.match(r'SOME_STRING_(([A-D][0-9]+)+)_[A-Z]+', 'SOME_STRING_A2B2C3_OTK')
>>> re.findall(r'([A-D])[0-9]+', match.group(1))
['A', 'B', 'C']

Is there any method can let me get the same result in a single step?

like image 596
Wang Avatar asked Mar 10 '26 11:03

Wang


1 Answers

Since (([A-D])[0-9]+)+ is a repeated capturing group, it is no wonder only the last match results are returned.

You may use a PyPi regex library (that you may install by typing pip install regex in the console/terminal and pressing ENTER) and then use:

import regex

results = regex.finditer(r'SOME_STRING_(([A-D])[0-9]+)+_[A-Z]+', 'SOME_STRING_A2B2C3_OTK')
print( [zip(x.captures(1),x.captures(2))  for x in results] )
# => [[('A2', 'A'), ('B2', 'B'), ('C3', 'C')]]

The match.captures property keeps track of all captures.

If you can only use re, you need to first extract all your matches, and then run a second regex on them to extract the parts you need:

import re
tmp = re.findall(r'SOME_STRING_((?:[A-D][0-9]+)+)_[A-Z]+', 'SOME_STRING_A2B2C3_OTK')
results = []
for m in tmp:
    results.append(re.findall(r'(([A-D])[0-9]+)', m))
print( results )
# => [[('A2', 'A'), ('B2', 'B'), ('C3', 'C')]]

See the Python demo

like image 81
Wiktor Stribiżew Avatar answered Mar 13 '26 05:03

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!