My data:
stack: 123 overflow: 456 others: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18 end: 42
My regular expression:
^stack: (\d+) overflow: (\d+) others: ?(.+) end: (\d+)$
Which matches the groups as:
1: 123
2: 456
3: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18
4: 42
Good so far. On group 3 then run the following regular expression:
^(?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+$
that does not work at all(why?), so I remove the ^ and $ and it matches. The match then looks like this:
1: 7 // <-- Works as expected.
2: 7
3: 15 // <-- Here I'd expected 2 groups matching: (13,14), (15,16)
4: 16 // <-- but I'm only getting the last group.
1: 8 // <-- This works and the remainder is as expected.
2: 8
3: 17
4: 18
I seem to be missing "13, 14" my inner group that matches one or more (?: - m: (\d+) t: (\d+))+ combinations.
Online test: http://gskinner.com/RegExr/?33urf, in case that gets butchered, my data there is: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18 and the regex is: (?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+.
I've read http://www.regular-expressions.info/captureall.html, and I think my problem is related to that? Any tips/pointers/help so I can match one or more m:t: combinations?
Most regex engines do not allow multiple captures from the same set of parentheses within a repeating group. If capturing parentheses match more than once, you get what matched last as the result.
The simplest work-around is to make a regex for only that sub-pattern and then get the results captured from each time it matches.
In other words, first get the relevant portion of the string and then use a regex like this on it:
/ - m: (\d+) t: (\d+)/
(Using whatever mechanism your language uses to match all).
Your groups get following numbers
^(?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+$
1 2 3 4
They are numbered by the opening brackets.
If this expression is now matched a second time, then the content from the capturing groups is overwritten.
You are repeating a capturing group.
As I know in .net it is possible to access all those matches, but in all other regex implementations the group content is overwritten.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With