Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx Challenge: Capture all the numbers in a specific row

Assume we have this text:

...
settingsA=9, 4.2 
settingsB=3, 1.5, 9, 2, 4, 6
settingsC=8, 3, 2.5, 1
...

The question is how can I capture all the numbers that are in specific row using a single step?

Single step means:

  • single regex pattern.
  • single operation (no loops or splits, etc.)
  • all matches are captured in one array.

Let's say I want to capture all the numbers that are present in row which starts with settingsB=. The final result should look like this:

3
1.5
9
2
4
6

My failed attempts:

<?php
    $subject =
        "settingsA=9, 4.2
         settingsB=3, 1.5, 9, 2, 4, 6
         settingsC=8, 3, 2.5, 1";

    $pattern = '([\d\.]+)(, )?' // FAILED!
    $pattern = '(?:settingsB=)(?:([\d\.]+)(?:, )?)' // FAILED!
    $pattern = '(?:settingsB=)(?:([\d\.]+)(?:, )?)+' // FAILED!
    $pattern = '(?<=^settingsB=|, )([\d+\.]+)' // FAILED!

    preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
    if ($matches) {
        print_r($matches);
    }
?>

UPDATE 1: @Saleem's example uses multiple steps instead of a single step, unfortunately. I'm not saying that his example is bad (it actually works), but I want to know if there is another way to do it and how. Any ideas?

UPDATE 2: @bobble bubble provided a perfect solution for this challenge.

like image 629
OlavH Avatar asked Mar 06 '16 23:03

OlavH


1 Answers

You can use the \G anchor to glue matches to the end of a previous match. This pattern which also uses \K to reset before the desired part would work with PCRE regex flavor.

(?:settingsB *=|\G(?!^) *,) *\K[\d.]+
  • (?: opens a non-capturing group for alternation
  • match settingsB, followed by * any amount of space, followed by literal =
  • |\G(?!^) or continue where the previous match ended but not start
  • *, and match a comma preceded by optional space
  • ) end of alternation (non-capturing group)
  • *\K reset after optional space
  • [\d.]+ match one or more digits & periods.

If the sequence contains tabs or newlines, use \s for whitespace character instead of space.

See demo at regex101 or PHP demo at eval.in

or this more compatible pattern with use of a capturing group instead of \K which should work in any regex flavor that supports the \G anchor (Java, .NET, Ruby...)

like image 162
bobble bubble Avatar answered Oct 22 '22 10:10

bobble bubble