Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"inconsistent" match result when using code block in regex [Raku]

While checking and testing various aspects of regexes, I stumbled upon a strange and "incosistent" behaviour. I was trying to use some code in a regex, but the same behaviour applies also with use of a void code block. Especially what moved me the most, was the difference in the match result when I interchanged the :g vs the :x modifiers.

The following code snippets depict the "inconsistent" behaviour.

First without the code block:

use v6.d;

if "test1 test2 test3 test4" ~~ m:g/ (\w+) / {
    say ~$_ for $/.list;
}

Result:

test1
test2
test3
test4

then with the :g modifier and the code block:

use v6.d;

if "test1 test2 test3 test4" ~~ m:g/ (\w+) {} / {
    say ~$_ for $/.list;
}

Result:

test4

and finally with the :x modifier and the code block

use v6.d;

if "test1 test2 test3 test4" ~~ m:x(4)/ (\w+) {} / {
    say ~$_ for $/.list;
}

Result:

test1
test2
test3
test4

I expected the three results to be the same but I was negatively surprised.

Is there any explanation about this behaviour?

like image 875
jakar Avatar asked Mar 08 '20 11:03

jakar


1 Answers

TL;DR Issue filed by @jakar and fixed by jnthn.


(Rewritten after more testing and code spelunking.)

This looks to me (and presumably you) like a bug. $/ is somehow getting kiboshed when using :g and an embedded block.

This answer covers:

  • Zeroing in on the problem

  • Looking at the compiler's source code

  • Searching issue queues and/or filing a new issue

Zeroing in on the problem

my &debug = {;} # start off doing no debugging
$_ = 'aa';

say       m      / {debug 1} 'a' {debug 2} /; debug 3; # 「a」
say $/ if m      / {debug 1} 'a' {debug 2} /; debug 3; # 「a」

say       m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
say $/ if m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)

say       m:g    / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
say $/ if m:g    / {debug 1} 'a' {debug 2} /; debug 3; # 「a」 <--- Uhoh

Now make debug say something useful and run the first pair (without a regex adverb):

&debug = { say $_, $/.WHICH } # Say location of object bound to `$/`

say       m      / {debug 1} 'a' {debug 2} /; debug 3; # 「a」
# 1Match|66118928
# 2Match|66118928
# 「a」
# 3Match|66118928

say $/ if m      / {debug 1} 'a' {debug 2} /; debug 3; # 「a」
# 1Match|66119072
# 2Match|66119072
# 「a」
# 3Match|66119072

The same simple result in both cases. The match process creates a Match object and sticks with the same one.

Now the two variations with the :x(2) adverb:

say       m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
# 1Match|66119936
# 2Match|66119936
# 1Match|66120080
# 2Match|66120080
# 1Match|66120224
# (「a」 「a」)
# 3List|67612624

say $/ if m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
# 1Match|66120368
# 2Match|66120368
# 1Match|66120512
# 2Match|66120512
# 1Match|66120656
# (「a」 「a」)
# 3List|67612672

This time the match process creates a Match object and sticks with it for one pass, then a second match object for a second pass, and finally a third match object for a third pass before it fails to match a third 'a' (and hence the corresponding debug 2 doesn't get called). At the end of the m.../.../ call it has created a List object and bound that to $/.

Next we run the first of the two :g cases:

say       m:g    / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
# 1Match|66119216
# 2Match|66119216
# 1Match|66119360
# 2Match|66119360
# 1Match|66119504
# (「a」 「a」)
# 3Match|66119504

Like the x:(2) case, we try a third time and fail. But the match process does not return a List but instead a Match object. And it's the one created in the third pass. (Which surprises me.)

Finally, there's the "Uhoh" case:

say $/ if m:g    / {debug 1} 'a' {debug 2} /; debug 3; # 「a」 <--- Uhoh
# 1Match|66119648
# 2Match|66119648
# 1Match|66119792
# 2Match|66119792
# 「a」
# 3Match|66119792

Remarkably, the expected third pass appears not to start.

Looking at the compiler's source code

It's plausible that exploring the relevant source code would be valuable. I'll write about that here in case it's of interest to you or other readers and in case this is a bug and what I write is of interest to someone fixing it.

Afaict a code block in a regex leads to an AST node being generated here that inserts a sub-node ahead of the statements in the block that does a bind operation:

                    :op('bind'),

                    QAST::Var.new( :name('$/'), :scope('lexical') ),

                    QAST::Op.new(
                        QAST::Var.new( :name('$¢'), :scope('lexical') ),
                        :name('MATCH'),
                        :op('callmethod')
                    )

My read of the above is that it inserts code that binds the lexical $/ symbol to the result of a .MATCH method call on the object bound to the lexical symbol immediately prior to running the code in the block.

The doc has a section on ; I'll quote a sentence:

The main difference between $/ and is scope: the latter only has a value inside [a] regex

I'm left wondering why exists and what other differences there are.

Moving on...

I see there's a raku level .MATCH. But it barely does anything. So I presume the code that's relevant is here.

At this point I'll pause. I might continue further in a later edit.

Searching issue queues and/or filing a new issue

If someone comes up with an answer in the next few days demonstrating that what you've shown isn't a bug, or has already been filed as a bug, then fair enough.

Otherwise, please consider doing your own search of the issue queues and/or starting a new issue in whatever issue queue you consider the most appropriate (default to /rakudo/rakudo/issues).

I've already searched the four github.com issue queues I considered plausibly relevant as part of writing this answer:

  • https://github.com/rakudo/rakudo/issues

  • https://github.com/Raku/old-issue-tracker/issues

  • https://github.com/perl6/nqp/issues

  • https://github.com/moarvm/moarvm/issues

I searched for two keywords that I hoped might uncover an existing issue ("global" and "publish"). No matching issues were relevant. Perhaps you can also look for other keywords that you think a filer might use.

If you do file an issue, please consider adding your tests, or mine or some other variant, converted into standard roast test cases, if you know how to do that.

like image 123
raiph Avatar answered Oct 20 '22 16:10

raiph