Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Regular Expression - Repeating Match of a Group

Tags:

regex

php

I have a string that may look something like this:

$r = 'Filed under: <a>Group1</a>, <a>Group2</a>';

Here is the regular expression I am using so far:

preg_match_all("/Filed under: (?:<a.*?>([\w|\d|\s]+?)<\/a>)+?/", $r, $matches);

I want the regular expression to inside the () to continue to make matches as designated with the +? at the end. But it just won't do it. ::sigh::

Any ideas. I know there has to be a way to do this in one regular expression instead of breaking it up.

like image 340
Senica Gonzalez Avatar asked Feb 05 '10 03:02

Senica Gonzalez


1 Answers

Just for fun here's a regex that will work with a single preg_match_all:

'%(?:Filed under:\s*+|\G</a>)[^<>]*+<a[^<>]*+>\K[^<>]*%`

Or, in a more readable format:

'%(?:
      Filed under:   # your sentinel string
    |                
      \G             # NEXT MATCH POSITION
      </a>           # an end tag
  )
  [^<>]*+          # some non-tag stuff     
  <a[^<>]*+>       # an opening tag
  \K               # RESET MATCH START
  [^<>]+           # the tag's contents
%x'

\G matches the position where the next match attempt would start, which is usually the spot where the previous successful match ended (but if the previous match was zero-length, it bumps ahead one more). That means the regex won't match a substring starting with </a> until after it's matched one starting with Filed under: at at least once.

After the sentinel string or an end tag has been matched, [^<>]*+<a[^<>]*+> consumes everything up to and including the next start tag. Then \K spoofs the start position so the match (if there is one) appears to start after the <a> tag (it's like a positive lookbehind, but more flexible). Finally, [^<>]+ matches the tag's contents and brings the match position up to the end tag so \G can match.

But, as I said, this is just for fun. If you don't have to do the job in one regex, you're better off with a multi-step approach like the one @codaddict used; it's more readable, more flexible, and more maintainable.

\K reference
\G reference

EDIT: Although the references I gave are for the Perl docs, these features are supported by PHP, too--or, more accurately, by the PCRE lib. I think the Perl docs are a little better, but you can also read about this stuff in the PCRE manual.

like image 88
Alan Moore Avatar answered Oct 07 '22 23:10

Alan Moore