Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How preg_match_all() processes strings?

I'm still learning a lot about PHP and string alteration is something that is of interest to me. I've used preg_match before for things like validating an email address or just searching for inquiries.

I just came from this post What's wrong in my regular expression? and was curious as to why the preg_match_all function produces 2 strings, 1 w/ some of the characters stripped and then the other w/ the desired output.

From what I understand about the function is that it goes over the string character by character using the RegEx to evaluate what to do with it. Could this RegEx have been structured in such a way as to bypass the first array entry and just produce the desired result?

and so you don't have to go to the other thread

$str = 'text^name1^Jony~text^secondname1^Smith~text^email1^example-
        [email protected]~';

preg_match_all('/\^([^^]*?)\~/', $str, $newStr);

for($i=0;$i<count($newStr[0]);$i++)
{
    echo $newStr[0][$i].'<br>';
}

echo '<br><br><br>';

for($i=0;$i<count($newStr[1]);$i++)
{
    echo $newStr[1][$i].'<br>';
} 

This will output

^Jony~
^Smith~
^[email protected]~


Jony
Smith
[email protected]

I'm curious if the reason for 2 array entries was due to the original sytax of the string or if it is the normal processing response of the function. Sorry if this shouldn't be here, but I'm really curious as to how this works.

thanks, Brodie

like image 950
Brodie Avatar asked Oct 19 '11 21:10

Brodie


2 Answers

It's standard behavior for preg_match and preg_match_all - the first string in the "matched values" array is the FULL string that was caught by the regex pattern. The subsequent array values are the 'capture groups', whose existence depends on the placement/position of () pairs in the regex pattern.

In your regex's case, /\^([^^]*?)\~/, the full matching string would be

^   Jony    ~
|     |     |
^  ([^^]*?) ~   -> $newstr[0] = ^Jony~
                -> $newstr[1] = Jony (due to the `()` capture group).
like image 116
Marc B Avatar answered Oct 17 '22 00:10

Marc B


Could this RegEx have been structured in such a way as to bypass the first array entry and just produce the desired result?

Absolutely. Use assertions. This regex:

preg_match_all('/(?<=\^)[^^]*?(?=~)/', $str, $newStr);

Results in:

Array
(
    [0] => Array
        (
            [0] => Jony
            [1] => Smith
            [2] => [email protected]
        )

)
like image 40
nachito Avatar answered Oct 17 '22 00:10

nachito