Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split string to 2D array with Regex?

I've got a problem that seems simple on the face of it but has defeated my meager regex skills. I have a string that I need to convert to an array and then process the values accordingly, which is simple enough, but the format of the string cannot be changed (it is generated elsewhere) and the logic of it has me baffled.

The string is:

[6] [2] [3] 12.00; [5] [4]

It's basically a set of ids and decimal values (in this case id 3 == 12.00). The quantity of ids could change at any moment and decimal values could be in any or all of the ids.

In an ideal world I would have the following array:

Array (
   [0] => Array (
             [id]  => 6
             [num] => 
          )
   [1] => Array (
             [id]  => 2
             [num] => 
          ) 
   [2] => Array (
             [id]  => 3
             [num] => 12.00 
          )
   Etc...

Do any of you regex wizards know how this can be accomplished with less swearing than I've been able to achieve?

I have thus far been able to extract the id's using:

preg_match_all('@\[(.*?)\]@s', $string, $array);

and the decimals using:

preg_match_all('/([0-9]+[,\.]{1}[0-9]{2})/', $string, $array);

but lose the correlation between id's and values.

like image 380
Matthew Chambers Avatar asked Nov 15 '11 17:11

Matthew Chambers


3 Answers

Example:

<?php

$string = '[6] [2] [3] 12.00; [5] [4]';

preg_match_all('/\[(?P<id>\d+)\](?: (?P<num>[\d\.]+);)?/', $string, $matches, PREG_SET_ORDER);

var_dump($matches);

Output:

array(5) {
  [0]=>
  array(3) {
    [0]=>
    string(3) "[6]"
    ["id"]=>
    string(1) "6"
    [1]=>
    string(1) "6"
  }
  [1]=>
  array(3) {
    [0]=>
    string(3) "[2]"
    ["id"]=>
    string(1) "2"
    [1]=>
    string(1) "2"
  }
  [2]=>
  array(5) {
    [0]=>
    string(10) "[3] 12.00;"
    ["id"]=>
    string(1) "3"
    [1]=>
    string(1) "3"
    ["num"]=>
    string(5) "12.00"
    [2]=>
    string(5) "12.00"
  }
  [3]=>
  array(3) {
    [0]=>
    string(3) "[5]"
    ["id"]=>
    string(1) "5"
    [1]=>
    string(1) "5"
  }
  [4]=>
  array(3) {
    [0]=>
    string(3) "[4]"
    ["id"]=>
    string(1) "4"
    [1]=>
    string(1) "4"
  }
}
like image 175
Francois Deschenes Avatar answered Oct 20 '22 02:10

Francois Deschenes


If you are happy with a list of either IDs or NUMs, then you could just combine your two working regexes into one call:

preg_match_all('@  \[(?P<id> \d+ )]   |   (?P<num> [\d,.]+)  @xs',
         $string, $array, PREG_SET_ORDER);

This will give you a list of associative arrays, with either id or num set, if you also use the PREG_SET_ORDER flag.

like image 1
mario Avatar answered Oct 20 '22 02:10

mario


Something like this? My php skills are rather weak so you will have to check how to access the named capturing groups id/num.

preg_match_all('/\[(?P<id>\d+)\]\s*(?P<num>[-+]?\b[0-9]+(?:\.[0-9]+)?\b)?/', $subject, $result, PREG_SET_ORDER);
for ($matchi = 0; $matchi < count($result); $matchi++) {
    for ($backrefi = 0; $backrefi < count($result[$matchi]); $backrefi++) {
        # Matched text = $result[$matchi][$backrefi];
    } 
}

How it works :

"
\[             # Match the character “[” literally
(?<id>         # Match the regular expression below and capture its match into backreference with name “id”
   \d             # Match a single digit 0..9
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
]              # Match the character “]” literally
\s             # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
   *              # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(?<num>        # Match the regular expression below and capture its match into backreference with name “num”
   [-+]           # Match a single character present in the list “-+”
      ?              # Between zero and one times, as many times as possible, giving back as needed (greedy)
   \b             # Assert position at a word boundary
   [0-9]          # Match a single character in the range between “0” and “9”
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   (?:            # Match the regular expression below
      \.             # Match the character “.” literally
      [0-9]          # Match a single character in the range between “0” and “9”
         +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   )?             # Between zero and one times, as many times as possible, giving back as needed (greedy)
   \b             # Assert position at a word boundary
)?             # Between zero and one times, as many times as possible, giving back as needed (greedy)
"

It also takes care of negative values.

like image 1
FailedDev Avatar answered Oct 20 '22 00:10

FailedDev