Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to work around PHP lookbehind fixed width limitation?

Tags:

regex

php

I ran into a problem when trying to match all numbers found between spesific words on my page. How would you match all the numbers in the following text, but only between the word "begin" and "end"?

11
a
b
13
begin
t
899
y
50
f
end
91
h

This works:

preg_match("/begin(.*?)end/s", $text, $out);
preg_match_all("/[0-9]{1,}/", $out[1], $result);

But can it be done in one expression?

I tried this but it doesnt do the trick

preg_match_all("/begin.*([0-9]{1,}).*end/s", $text, $out);
like image 747
Kristian Rafteseth Avatar asked Mar 05 '14 10:03

Kristian Rafteseth


1 Answers

You can make use of the \G anchor like this, and some lookaheads to make sure that you're not going 'out of territory' (out of the area between the two words):

(?:begin|(?!^)\G)(?:(?=(?:(?!begin).)*end)\D)*?(\d+)

regex101 demo

(?:                  # Begin of first non-capture group
  begin              # Match 'begin'
|                    # Or
  (?!^)\G            # Start the match from the previous end of match
)                    # End of first non-capture group
(?:                  # Second non-capture group
  (?=                # Positive lookahead
    (?:(?!begin).)*  # Negative lookahead to prevent running into another 'begin'
    end              # And make sure that there's an 'end' ahead
  )                  # End positive lookahead
  \D                 # Match non-digits
)*?                  # Second non-capture group repeated many times, lazily
(\d+)                # Capture digits

A debuggex if that also helps:

Regular expression visualization

like image 126
Jerry Avatar answered Oct 23 '22 04:10

Jerry