extract a prefix and multiple subsequent matches

Question

My Problem

I have a line that contains a prefix followed by one or more matched patterns. For example, the prefix is a letter followed by one or more numbers separated by spaces:

s='A 3 4 5'

I would like to find a regex pattern that would extract both the prefix and the repeated patterns.

What Have I Tried

s='''A 3 4 5'''
reg = re.compile(r'''
    ^(\w)       # Prefix
    (
        \s*     # Space separator
        (\d+)   # Pattern
        \s*     # Space separator
    )*
''', re.VERBOSE)
print(reg.findall(s))

However, it only finds the prefix and a single match:

[('A', '5', '5')]

The matched pattern appears twice because I have two groups - one containing the pattern itself and one containing the pattern with its separators.

My Question

How can I retrieve a single prefix and multiple matched patterns separated by a given divider using a Python regex?

jterrace · Accepted Answer

This will require a two-level regex. Here's an example way to do it:

>>> import re
>>> s='''A 3 4 5'''
>>> outer_match = re.match(r'^(?P<prefix>\w)(?P<suffix>(\s*\d+\s*)*)', s)
>>> outer_match.groupdict()
{'prefix': 'A', 'suffix': ' 3 4 5'}

Then to extract the suffix pieces:

>>> prefix = outer_match.group('prefix')
>>> suffixes = re.findall(r'\s*(?P<val>\d+)\s*', outer_match.group('suffix'))
>>> suffixes
['3', '4', '5']

Tim Biegeleisen · Answer

This is a tricky question, owing to the fact that once the regex engine matches and consumes the prefix A, it won't check it again. Here is one workaround which avoids directly using regex:

s = 'A 3 4 5'
prefix = re.findall(r'[A-Z]+', s)[0]
terms = re.sub(r'\b(\d+)\b', prefix + r'\1', s).split(' ')[1:]
print(terms)

This prints:

['A3', 'A4', 'A5']

If you don't already have the s input in the format given above, then you might have to do some massaging to arrive at this starting point before you consider the above answer.

extract a prefix and multiple subsequent matches

Tags:

python

regex

regex-group

My Problem

What Have I Tried

My Question

Adam Matan

2 Answers

jterrace

Tim Biegeleisen

Recent Activity

Donate For Us

extract a prefix and multiple subsequent matches

Tags:

python

regex

regex-group

My Problem

What Have I Tried

My Question

Adam Matan

2 Answers

jterrace

Tim Biegeleisen

Related questions

Recent Activity

Donate For Us