Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will a lookahead in regular expressions always not capture or does it depend?

Tags:

regex

I've been reading some articles on non-capturing groups on this site and on the net (such as http://www.regular-expressions.info/brackets.html and http://www.asiteaboutnothing.net/regexp/regex-disambiguation.html, What does the "?:^" regular expression mean?, What is a non-capturing group? What does a question mark followed by a colon (?:) mean?)

I am clear on the meaning of (?:foo). What I am unclear about is (?=foo). Is (?=foo) also always a non-capturing group, or does it depend?

like image 466
Jon Lyles Avatar asked Jul 11 '12 14:07

Jon Lyles


2 Answers

No, (?=foo) will not capture "foo". Any look-around assertion (negative- and positive look ahead & behind) will not capture, but only check the presence (or absence) of text.

For example, the regex:

(X(?=\d+))

matches "X" only when there's one or more digits after it. However, these digits are not a part of match group 1.

You can define captures inside the look ahead to capture it. For example, the regex:

(X(?=(\d+)))

matches "X" only when there's one or more digits after it. And these digits are captured in match group 2.

A PHP demo:

<?php
$s = 'X123';
preg_match_all('/(X(?=(\d+)))/', $s, $matches);
print_r($matches);
?>

will print:

Array
(
    [0] => Array
        (
            [0] => X
        )

    [1] => Array
        (
            [0] => X
        )

    [2] => Array
        (
            [0] => 123
        )

)
like image 176
Bart Kiers Avatar answered Sep 29 '22 21:09

Bart Kiers


Lookarounds are always non-capturing and zero-width.

like image 21
slackwing Avatar answered Sep 29 '22 22:09

slackwing