How to get group name of match regular expression in Python?

Question

Question is very basic whatever I do not know how to figure out group name from match. Let me explain in code:

import re    
a = list(re.finditer('(?P<name>[^\W\d_]+)|(?P<number>\d+)', 'Ala ma kota'))

How to get group name of a[0].group(0) match - assume that number of named patterns can be larger?

Example is simplified to learn basics.

I can invert match a[0].groupdict() but it will be slow.

Martijn Pieters · Accepted Answer

You can get this information from the compiled expression:

>>> pattern = re.compile(r'(?P<name>\w+)|(?P<number>\d+)')
>>> pattern.groupindex
{'name': 1, 'number': 2}

This uses the RegexObject.groupindex attribute:

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers. The dictionary is empty if no symbolic groups were used in the pattern.

If you only have access to the match object, you can get to the pattern with the MatchObject.re attribute:

>>> a = list(re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'Ala ma kota'))
>>> a[0]
<_sre.SRE_Match object at 0x100264ad0>
>>> a[0].re.groupindex
{'name': 1, 'number': 2}

If all you wanted to know what group matched look at the value; None means a group never was used in a match:

>>> a[0].groupdict()
{'name': 'Ala', 'number': None}

The number group never used to match anything because its value is None.

You can then find the names used in the regular expression with:

names_used = [name for name, value in matchobj.groupdict().iteritems() if value is not None]

or if there is only ever one group that can match, you can use MatchObject.lastgroup:

name_used = matchobj.lastgroup

As a side note, your regular expression has a fatal flaw; everything that \d matches, is also matched by \w. You'll never see number used where name can match first. Reverse the pattern to avoid this:

>>> for match in re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'word 42'):
...     print match.lastgroup
... 
name
name
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word 42'):
...     print match.lastgroup
... 
name
number

but take into account that words starting with digits will still confuse things for your simple case:

>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word42 42word'):
...     print match.lastgroup, repr(match.group(0))
... 
name 'word42'
number '42'
name 'word'

davidedb · Answer

First of all your regular expression is syntactically wrong: you should write it as r'(?P<name>\w+)|(?P<number>\d+)'. Moreover even this reg expr does not work, since the special sequence \w matches all alphanumeric characters and hence also all characters matched by \d. You should change it to r'(?P<number>\d+)|(?P<name>\w+)' to give \d precedence over \w. However you can get the name of the matching group by using the attribute lastgroup of the matched objects, i.e.: [m.lastgroup for m in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'Ala ma 123 kota')] producing: ['name', 'name', 'number', 'name']

How to get group name of match regular expression in Python?

Tags:

python

regex

python-2.7

Chameleon

2 Answers

Martijn Pieters

davidedb

Recent Activity

Donate For Us

How to get group name of match regular expression in Python?

Tags:

python

regex

python-2.7

Chameleon

2 Answers

Martijn Pieters

davidedb

Related questions

Recent Activity

Donate For Us