Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expressions - how to capture multiple groups from a wildcard expression?

I have a Python regular expression that contains a group which can occur zero or many times - but when I retrieve the list of groups afterwards, only the last one is present. Example:

re.search("(\w)*", "abcdefg").groups()

this returns the list ('g',)

I need it to return ('a','b','c','d','e','f','g',)

Is that possible? How can I do it?

like image 483
John B Avatar asked Jan 21 '09 10:01

John B


People also ask

How do I capture a group in RegEx?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

What is capturing group in RegEx Python?

Capturing groups are a handy feature of regular expression matching that allows us to query the Match object to find out the part of the string that matched against a particular part of the regular expression. Anything you have in parentheses () will be a capture group.

When capturing RegEx groups what datatype does the groups method return?

groups() method. This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.

How do you use wildcards in RegEx?

In regular expressions, the period ( . , also called "dot") is the wildcard pattern which matches any single character. Combined with the asterisk operator . * it will match any number of any characters.


2 Answers

re.findall(r"\w","abcdefg") 
like image 196
Douglas Leeder Avatar answered Sep 19 '22 00:09

Douglas Leeder


In addition to Douglas Leeder's solution, here is the explanation:

In regular expressions the group count is fixed. Placing a quantifier behind a group does not increase group count (imagine all other group indexes increment because an eralier group matched more than once).

Groups with quantifiers are the way of making a complex sub-expression atomic, when there is need to match it more than once. The regex engine has no other way than saving the last match only to the group. In short: There is no way to achieve what you want with a single "unarmed" regular expression, and you have to find another way.

like image 29
Tomalak Avatar answered Sep 22 '22 00:09

Tomalak