Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capturing named groups in regex with re.findall

When I was trying to answer this question: regex to split %ages and values in python I noticed that I had to re-order the groups from the result of findall. For example:

data = """34% passed 23% failed 46% deferred"""
result = {key:value for value, key in re.findall('(\w+)%\s(\w+)', data)}
print(result)
>>> {'failed': '23', 'passed': '34', 'deferred': '46'}

Here the result of the findall is:

>>> re.findall('(\w+)%\s(\w+)', data)
>>> [('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]

Is there a way to change/specify the order of the groups that makes re.findall return:

[('passed', '34'), ('failed', '23'), ('deferred', '46')]

Just to clarify, the question is:

Is it possible to specfic the order or re-order the groups for the return of the re.findall function?

I used the example above to create a dictionary to provide a reason/use case for when you would want to change the order (making key as value and value as key)

Further clarification:

In order to handle groups in larger more complicated regexes, you can name groups, but those names are only accessible when you do a re.search pr re.match. From what I have read, findall has a fixed indices for groups returned in the tuple, The question is anyone know how those indices could be modified. This would help make handling of groups easier and intuitive.

like image 344
ashwinjv Avatar asked Sep 02 '14 17:09

ashwinjv


People also ask

How do I capture a group in RegEx?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

What is capturing group in RegEx Python?

Capturing groups are a handy feature of regular expression matching that allows us to query the Match object to find out the part of the string that matched against a particular part of the regular expression. Anything you have in parentheses () will be a capture group.

What does RegEx Findall return?

findall(): Finding all matches in a string/list. Regex's findall() function is extremely useful as it returns a list of strings containing all matches. If the pattern is not found, re. findall() returns an empty list.

When capturing RegEx groups what datatype does the groups method return?

groups() method. This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.


1 Answers

Take 3, based on a further clarification of the OP's intent in this comment.

Ashwin is correct that findall does not preserve named capture groups (e.g. (?P<name>regex)). finditer to the rescue! It returns the individual match objects one-by-one. Simple example:

data = """34% passed 23% failed 46% deferred"""
for m in re.finditer('(?P<percentage>\w+)%\s(?P<word>\w+)', data):
    print( m.group('percentage'), m.group('word') )
like image 135
Dan Lenski Avatar answered Sep 28 '22 04:09

Dan Lenski