Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex parsing from delimited string with sequential groups

I'm trying to parse out words from a delimited string, and have the capture groups in sequential order. for example

dog.cat.chicken.horse.whale

I know of ([^.]+) which can parse out each word but this puts every string in capture group 1.

Match 1
Full match  0-3 `dog`
Group 1.    0-3 `dog`
Match 2
Full match  4-7 `cat`
Group 1.    4-7 `cat`
Match 3
Full match  8-15    `chicken`
Group 1.    8-15    `chicken`
Match 4
Full match  16-21   `horse`
Group 1.    16-21   `horse`
Match 5
Full match  22-27   `whale`
Group 1.    22-27   `whale`

What I really need is something like

Match 1
Full match  0-27    `dog.cat.chicken.horse.whale`
Group 1.    0-3 `dog`
Group 2.    4-7 `cat`
Group 3.    8-15    `chicken`
Group 4.    16-21   `horse`
Group 5.    22-27   `whale`

I've tried multiple iterations with no success, does anyone know how to do this?

  • I'm using these Regex expressions in Prometheus' configuration for relabeling metrics. More info here: https://medium.com/quiq-blog/prometheus-relabeling-tricks-6ae62c56cbda
like image 421
Richard Oswald Avatar asked Jan 12 '18 17:01

Richard Oswald


1 Answers

There is no good solution for this case. All you might do is add optional non-capturing groups with the capturing ones to account for some set number of groups.

So, it might look like

([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?

and so on and so forth, just add more (?:\.([^.]+))? until you reach some limit that you should define.

See the regex demo.

Note that you might want to anchor the pattern to avoid partial matches:

^([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?$

The ^ matches the start of the string and $ asserts the position at the end of the string.

like image 63
Wiktor Stribiżew Avatar answered Sep 23 '22 17:09

Wiktor Stribiżew