Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl regex vs. Raku regex, differences in the engine?

Tags:

regex

raku

I am trying to convert a regex based solution for the knapsack problem from Perl to raku. Details on Perlmonks

The Perl solution creates this regex:

(?<P>(?:vvvvvvvvvv)?)
(?<B>(?:vv)?)
(?<Y>(?:vvvv)?)
(?<G>(?:vv)?)
(?<R>(?:v)?)
0
(?=
(?(?{ $1 })wwww|)
(?(?{ $2 })w|)
(?(?{ $3 })wwwwwwwwwwww|)
(?(?{ $4 })ww|)
(?(?{ $5 })w|)
)

which gets matched against vvvvvvvvvvvvvvvvvvv0wwwwwwwwwwwwwww. After that the match hash %+ contains the items to put in the sack.

My raku conversion is:

$<B> = [ [ vv ]? ]
$<P> = [ [ vvvvvvvvvv ]? ]
$<R> = [ [ v ]? ]
$<Y> = [ [ vvvv ]? ]
$<G> = [ [ vv ]? ]
0
<?before
[ { say "B"; say $/<B>; say $0; say $1; $1 } w || { "" } ]
[ { say "P"; say $/<P>; say $0; say $1; $2 } wwww || { "" } ]
[ { say "R"; say $/<R>; say $0; say $1; $3 } w || { "" } ]
[ { say "Y"; say $/<Y>; say $0; say $1; $4 } wwwwwwwwwwww || { "" } ]
[ { say "G"; say $/<G>; say $0; say $1; $5 } ww || { "" } ]

which also matches vvvvvvvvvvvvvvvvvvv0wwwwwwwwwwwwwww. But the match object, $/ does not contain anything useful. Also, my debug says all say Nil, so at that point the backreference does not seem to work?

Here's my test script:

my $max-weight = 15;
my %items      =
    'R' => { w =>  1, v =>  1 },
    'B' => { w =>  1, v =>  2 },
    'G' => { w =>  2, v =>  2 },
    'Y' => { w => 12, v =>  4 },
    'P' => { w =>  4, v => 10 }
;

my $str = 'v' x  %items.map(*.value<v>).sum ~
          '0' ~
          'w' x  $max-weight;

say $str;

my $i = 0;
my $left = my $right = '';

for %items.keys -> $item-name
{
    my $v = 'v' x %items{ $item-name }<v>;
    my $w = 'w' x %items{ $item-name }<w>;

     $left  ~= sprintf( '$<%s> = [ [ %s ]? ] ' ~"\n", $item-name, $v );
     $right ~= sprintf( '[ { say "%s"; say $/<%s>; say $0; say $1; $%d } %s || { "" } ]' ~ "\n", $item-name, $item-name, ++$i, $w );
}
use MONKEY-SEE-NO-EVAL;

my $re = sprintf( '%s0' ~ "\n" ~ '<?before ' ~ "\n" ~ '%s>' ~ "\n", $left, $right );

say $re;
dd $/ if $str ~~ m:g/<$re>/;
like image 523
Holli Avatar asked Nov 28 '19 15:11

Holli


People also ask

What is the meaning of $1 in Perl regex?

$1 equals the text " brown ".

How do I match a newline in Perl?

Solution. Use /m , /s , or both as pattern modifiers. /s lets . match newline (normally it doesn't). If the string had more than one line in it, then /foo.

How do I match parentheses in Perl?

So, if you use /./, you'll match any single character (except newline); if you use /(.)/, you'll still match any single character, but now it will be kept in a regular expression memory. For each pair of parentheses in the pattern, you'll have one regular expression memory.

What is the difference between PCRE and Perl regex?

It is generally also the regex flavor used by applications developed in Java. Perl: The regex flavor used in the Perl programming language, versions 5.6 and 5.8. Versions prior to 5.6 do not support Unicode. PCRE: The open source PCRE library. The feature set described here is available in PCRE 5.x and 6.x.

What is the difference between Java and Perl regex?

Java: The regex flavor of the java.util.regex package, available in the Java 4 (JDK 1.4.x) and later. A few features were added in Java 5 (JDK 1.5.x) and Java 6 (JDK 1.6.x). It is generally also the regex flavor used by applications developed in Java. Perl: The regex flavor used in the Perl programming language, versions 5.6 and 5.8.

What is regex++?

^ Formerly called Regex++. ^ a b One of fuzzy regular expression engines. ^ Included since version 2.13.0. ^ ICU4J, the Java version, does not support regular expressions.

What is a regular expression called in raku?

In acknowledgement of this, and in an attempt to disambiguate, a regular expression in Raku is normally referred to as a regex (from: reg ular ex pression), a term that is also in common use in other programming languages. In Raku, regexes are written in a domain-specific language, i.e. a sublanguage or slang.


1 Answers

This answer only covers what's going wrong. It does not address a solution. I have not filed corresponding bugs. I have not yet even searched bug queues to see if I can find reports corresponding to either or both the two issues I've surfaced.

my $lex-var;

sub debug { .say for ++$, :$<rex-var>, :$lex-var }

my $regex = / $<rex-var> = (.) { $lex-var = $<rex-var> } <?before . { debug }> / ;

'xx' ~~   $regex;     say $/;
'xx' ~~ / $regex /;   say $/;

displays:

1
rex-var => Nil
lex-var => 「x」
「x」
 rex-var => 「x」
2
rex-var => Nil
lex-var => 「x」
「x」

Focusing first on the first call of debug (the lines starting with 1 and ending at rex-var => 「x」), we can see that:

  • Something's gone awry during the call to debug: $<rex-var> is reported as having the value Nil.

  • When the regex match is complete and we return to the mainline, the say $/ reports a full and correctly populated result that includes the rex-var named match.

To begin to get a sense of what's gone wrong, please consider reading the bulk of my answer to another SO question. You can safely skip the Using ~. Footnotes 1,2, and 6 are also probably completely irrelevant to your scenario.

For the second match, we see that not only is $<rex-var> reported as being Nil during the debug call, the final match variable, as reported back in the mainline with the second say $/, is also missing the rex-var match. And the only difference is that the regex $regex is called from within an outer regex.

like image 144
raiph Avatar answered Oct 14 '22 03:10

raiph