Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I regex match with grouping with unknown number of groups

Tags:

python

regex

I want to do a regex match (in Python) on the output log of a program. The log contains some lines that look like this:

...  VALUE 100 234 568 9233 119 ...  VALUE 101 124 9223 4329 1559 ... 

I would like to capture the list of numbers that occurs after the first incidence of the line that starts with VALUE. i.e., I want it to return ('100','234','568','9233','119'). The problem is that I do not know in advance how many numbers there will be.

I tried to use this as a regex:

VALUE (?:(\d+)\s)+ 

This matches the line, but it only captures the last value, so I just get ('119',).

like image 664
Lorin Hochstein Avatar asked Sep 10 '09 20:09

Lorin Hochstein


People also ask

How do I match a group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

How do I match a range of numbers in regex?

The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99. Something like ^[2-9][1-6]$ matches 21 or even 96! Any help would be appreciated.

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.

Does empty regex match everything?

An empty regular expression matches everything.


1 Answers

What you're looking for is a parser, instead of a regular expression match. In your case, I would consider using a very simple parser, split():

s = "VALUE 100 234 568 9233 119" a = s.split() if a[0] == "VALUE":     print [int(x) for x in a[1:]] 

You can use a regular expression to see whether your input line matches your expected format (using the regex in your question), then you can run the above code without having to check for "VALUE" and knowing that the int(x) conversion will always succeed since you've already confirmed that the following character groups are all digits.

like image 150
Greg Hewgill Avatar answered Oct 11 '22 16:10

Greg Hewgill