Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Have a regular expression keep matching as much as possible?

Tags:

regex

perl

Is there a convenient way to write a regex that will try to match as much of the regex as possible?

Example:

my $re = qr/a ([a-z]+) (\d+)/;

match_longest($re, "a") => ()
match_longest($re, "a word") => ("word")
match_longest($re, "a word 123") => ("word", "123")
match_longest($re, "a 123") => ()

That is, $re is considered to be a sequence of regular expressions, and match_longest attempts to match as much of this sequence. In a sense, matching never fails - it's only a question of how much matching succeeded. Once a regex match fails, undef for the parts that didn't match.

I know I could write a function which takes a sequence of regexes and creates a single regex to do the job of match_longest. Here's an outline of the idea:

Suppose you have three regexes: $r1, $r2 and $r3. The single regex to perform the job of match_longest would have the following structure:

$r = ($r1 $r2 $r3)? | $r1 ($r2 $r3) | $r1 $r2 $r3?

Unfortunately, this is quadratic in the number of regexes. Is it possible to be more efficient?

like image 302
ErikR Avatar asked Aug 14 '11 15:08

ErikR


2 Answers

You can use the regex

$r = ($r1 ($r2 ($r3)?)?)?

which has each regex contained only once. You may also use non-capturing groups (?:...) in this example to not interfere with your original regular expressions.

like image 175
Howard Avatar answered Oct 25 '22 04:10

Howard


If I understand the question, using nested groups with ? should work:

my $re = qr/a ((\w+) (\d+)?)?/;
like image 22
Matt Ball Avatar answered Oct 25 '22 03:10

Matt Ball