Question is very basic whatever I do not know how to figure out group name from match. Let me explain in code:
import re
a = list(re.finditer('(?P<name>[^\W\d_]+)|(?P<number>\d+)', 'Ala ma kota'))
How to get group name of a[0].group(0)
match - assume that number of named patterns can be larger?
Example is simplified to learn basics.
I can invert match a[0].groupdict()
but it will be slow.
You can get this information from the compiled expression:
>>> pattern = re.compile(r'(?P<name>\w+)|(?P<number>\d+)')
>>> pattern.groupindex
{'name': 1, 'number': 2}
This uses the RegexObject.groupindex
attribute:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers. The dictionary is empty if no symbolic groups were used in the pattern.
If you only have access to the match object, you can get to the pattern with the MatchObject.re
attribute:
>>> a = list(re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'Ala ma kota'))
>>> a[0]
<_sre.SRE_Match object at 0x100264ad0>
>>> a[0].re.groupindex
{'name': 1, 'number': 2}
If all you wanted to know what group matched look at the value; None
means a group never was used in a match:
>>> a[0].groupdict()
{'name': 'Ala', 'number': None}
The number
group never used to match anything because its value is None
.
You can then find the names used in the regular expression with:
names_used = [name for name, value in matchobj.groupdict().iteritems() if value is not None]
or if there is only ever one group that can match, you can use MatchObject.lastgroup
:
name_used = matchobj.lastgroup
As a side note, your regular expression has a fatal flaw; everything that \d
matches, is also matched by \w
. You'll never see number
used where name
can match first. Reverse the pattern to avoid this:
>>> for match in re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'word 42'):
... print match.lastgroup
...
name
name
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word 42'):
... print match.lastgroup
...
name
number
but take into account that words starting with digits will still confuse things for your simple case:
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word42 42word'):
... print match.lastgroup, repr(match.group(0))
...
name 'word42'
number '42'
name 'word'
First of all your regular expression is syntactically wrong: you should write it as r'(?P<name>\w+)|(?P<number>\d+)'
. Moreover even this reg expr does not work, since the special sequence \w
matches all alphanumeric characters and hence also all characters matched by \d
.
You should change it to r'(?P<number>\d+)|(?P<name>\w+)'
to give \d
precedence over \w
.
However you can get the name of the matching group by using the attribute lastgroup
of the matched objects, i.e.:
[m.lastgroup for m in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'Ala ma 123 kota')]
producing:
['name', 'name', 'number', 'name']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With