Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract a prefix and multiple subsequent matches

My Problem

I have a line that contains a prefix followed by one or more matched patterns. For example, the prefix is a letter followed by one or more numbers separated by spaces:

s='A 3 4 5'

I would like to find a regex pattern that would extract both the prefix and the repeated patterns.

What Have I Tried

s='''A 3 4 5'''
reg = re.compile(r'''
    ^(\w)       # Prefix
    (
        \s*     # Space separator
        (\d+)   # Pattern
        \s*     # Space separator
    )*
''', re.VERBOSE)
print(reg.findall(s))

However, it only finds the prefix and a single match:

[('A', '5', '5')]

The matched pattern appears twice because I have two groups - one containing the pattern itself and one containing the pattern with its separators.

My Question

How can I retrieve a single prefix and multiple matched patterns separated by a given divider using a Python regex?

like image 348
Adam Matan Avatar asked Nov 16 '25 10:11

Adam Matan


2 Answers

This will require a two-level regex. Here's an example way to do it:

>>> import re
>>> s='''A 3 4 5'''
>>> outer_match = re.match(r'^(?P<prefix>\w)(?P<suffix>(\s*\d+\s*)*)', s)
>>> outer_match.groupdict()
{'prefix': 'A', 'suffix': ' 3 4 5'}

Then to extract the suffix pieces:

>>> prefix = outer_match.group('prefix')
>>> suffixes = re.findall(r'\s*(?P<val>\d+)\s*', outer_match.group('suffix'))
>>> suffixes
['3', '4', '5']
like image 85
jterrace Avatar answered Nov 19 '25 00:11

jterrace


This is a tricky question, owing to the fact that once the regex engine matches and consumes the prefix A, it won't check it again. Here is one workaround which avoids directly using regex:

s = 'A 3 4 5'
prefix = re.findall(r'[A-Z]+', s)[0]
terms = re.sub(r'\b(\d+)\b', prefix + r'\1', s).split(' ')[1:]
print(terms)

This prints:

['A3', 'A4', 'A5']

If you don't already have the s input in the format given above, then you might have to do some massaging to arrive at this starting point before you consider the above answer.

like image 26
Tim Biegeleisen Avatar answered Nov 18 '25 23:11

Tim Biegeleisen