Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx with multiple groups?

Tags:

python

regex

I'm getting confused returning multiple groups in Python. My RegEx is this:

lun_q = 'Lun:\s*(\d+\s?)*' 

And my string is

s = '''Lun:                     0 1 2 3 295 296 297 298'''` 

I return a matched object, and then want to look at the groups, but all it shows it the last number (258):

r.groups()   (u'298',) 

Why isn't it returning groups of 0,1,2,3,4 etc.?

like image 410
joslinm Avatar asked Feb 10 '11 22:02

joslinm


People also ask

What are regex capture groups?

Advertisements. Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

How do you group together in regular expressions?

A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.

What is first capturing group in regex?

First group matches abc. Escaped parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex.

What are match groups regex?

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group.


2 Answers

Your regex only contains a single pair of parentheses (one capturing group), so you only get one group in your match. If you use a repetition operator on a capturing group (+ or *), the group gets "overwritten" each time the group is repeated, meaning that only the last match is captured.

In your example here, you're probably better off using .split(), in combination with a regex:

lun_q = 'Lun:\s*(\d+(?:\s+\d+)*)' s = '''Lun: 0 1 2 3 295 296 297 298'''  r = re.search(lun_q, s)  if r:     luns = r.group(1).split()      # optionally, also convert luns from strings to integers     luns = [int(lun) for lun in luns] 
like image 161
Ben Blank Avatar answered Sep 23 '22 23:09

Ben Blank


Another approach would be to use the regex you have to validate your data and then use a more specific regex that targets each item you wish to extract using a match iterator.

import re s = '''Lun: 0 1 2 3 295 296 297 298''' lun_validate_regex = re.compile(r'Lun:\s*((\d+)(\s\d+)*)') match = lun_validate_regex.match(s) if match:     token_regex = re.compile(r"\d{1,3}")     match_iterator = token_regex.finditer(match.group(1))     for token_match in match_iterator:         #do something brilliant 
like image 32
pokstad Avatar answered Sep 21 '22 23:09

pokstad