Is there a convenient way to write a regex that will try to match as much of the regex as possible?
Example:
my $re = qr/a ([a-z]+) (\d+)/;
match_longest($re, "a") => ()
match_longest($re, "a word") => ("word")
match_longest($re, "a word 123") => ("word", "123")
match_longest($re, "a 123") => ()
That is, $re
is considered to be a sequence of regular expressions, and match_longest
attempts to match as much of this sequence. In a sense, matching never fails - it's only a question of how much matching succeeded. Once a regex match fails, undef
for the parts that didn't match.
I know I could write a function which takes a sequence of regexes and creates a single regex to do the job of match_longest
. Here's an outline of the idea:
Suppose you have three regexes: $r1
, $r2
and $r3
. The single regex to perform the job of match_longest
would have the following structure:
$r = ($r1 $r2 $r3)? | $r1 ($r2 $r3) | $r1 $r2 $r3?
Unfortunately, this is quadratic in the number of regexes. Is it possible to be more efficient?
You can use the regex
$r = ($r1 ($r2 ($r3)?)?)?
which has each regex contained only once. You may also use non-capturing groups (?:...)
in this example to not interfere with your original regular expressions.
If I understand the question, using nested groups with ?
should work:
my $re = qr/a ((\w+) (\d+)?)?/;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With