I have a line that contains a prefix followed by one or more matched patterns. For example, the prefix is a letter followed by one or more numbers separated by spaces:
s='A 3 4 5'
I would like to find a regex pattern that would extract both the prefix and the repeated patterns.
s='''A 3 4 5'''
reg = re.compile(r'''
^(\w) # Prefix
(
\s* # Space separator
(\d+) # Pattern
\s* # Space separator
)*
''', re.VERBOSE)
print(reg.findall(s))
However, it only finds the prefix and a single match:
[('A', '5', '5')]
The matched pattern appears twice because I have two groups - one containing the pattern itself and one containing the pattern with its separators.
How can I retrieve a single prefix and multiple matched patterns separated by a given divider using a Python regex?
This will require a two-level regex. Here's an example way to do it:
>>> import re
>>> s='''A 3 4 5'''
>>> outer_match = re.match(r'^(?P<prefix>\w)(?P<suffix>(\s*\d+\s*)*)', s)
>>> outer_match.groupdict()
{'prefix': 'A', 'suffix': ' 3 4 5'}
Then to extract the suffix pieces:
>>> prefix = outer_match.group('prefix')
>>> suffixes = re.findall(r'\s*(?P<val>\d+)\s*', outer_match.group('suffix'))
>>> suffixes
['3', '4', '5']
This is a tricky question, owing to the fact that once the regex engine matches and consumes the prefix A, it won't check it again. Here is one workaround which avoids directly using regex:
s = 'A 3 4 5'
prefix = re.findall(r'[A-Z]+', s)[0]
terms = re.sub(r'\b(\d+)\b', prefix + r'\1', s).split(' ')[1:]
print(terms)
This prints:
['A3', 'A4', 'A5']
If you don't already have the s input in the format given above, then you might have to do some massaging to arrive at this starting point before you consider the above answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With