While checking and testing various aspects of regexes, I stumbled upon a strange and "incosistent" behaviour. I was trying to use some code in a regex, but the same behaviour applies also with use of a void code block. Especially what moved me the most, was the difference in the match result when I interchanged the :g vs the :x modifiers.
The following code snippets depict the "inconsistent" behaviour.
First without the code block:
use v6.d;
if "test1 test2 test3 test4" ~~ m:g/ (\w+) / {
say ~$_ for $/.list;
}
Result:
test1
test2
test3
test4
then with the :g modifier and the code block:
use v6.d;
if "test1 test2 test3 test4" ~~ m:g/ (\w+) {} / {
say ~$_ for $/.list;
}
Result:
test4
and finally with the :x modifier and the code block
use v6.d;
if "test1 test2 test3 test4" ~~ m:x(4)/ (\w+) {} / {
say ~$_ for $/.list;
}
Result:
test1
test2
test3
test4
I expected the three results to be the same but I was negatively surprised.
Is there any explanation about this behaviour?
TL;DR Issue filed by @jakar and fixed by jnthn.
(Rewritten after more testing and code spelunking.)
This looks to me (and presumably you) like a bug. $/
is somehow getting kiboshed when using :g
and an embedded block.
This answer covers:
Zeroing in on the problem
Looking at the compiler's source code
Searching issue queues and/or filing a new issue
my &debug = {;} # start off doing no debugging
$_ = 'aa';
say m / {debug 1} 'a' {debug 2} /; debug 3; # 「a」
say $/ if m / {debug 1} 'a' {debug 2} /; debug 3; # 「a」
say m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
say $/ if m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
say m:g / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
say $/ if m:g / {debug 1} 'a' {debug 2} /; debug 3; # 「a」 <--- Uhoh
Now make debug
say something useful and run the first pair (without a regex adverb):
&debug = { say $_, $/.WHICH } # Say location of object bound to `$/`
say m / {debug 1} 'a' {debug 2} /; debug 3; # 「a」
# 1Match|66118928
# 2Match|66118928
# 「a」
# 3Match|66118928
say $/ if m / {debug 1} 'a' {debug 2} /; debug 3; # 「a」
# 1Match|66119072
# 2Match|66119072
# 「a」
# 3Match|66119072
The same simple result in both cases. The match process creates a Match
object and sticks with the same one.
Now the two variations with the :x(2)
adverb:
say m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
# 1Match|66119936
# 2Match|66119936
# 1Match|66120080
# 2Match|66120080
# 1Match|66120224
# (「a」 「a」)
# 3List|67612624
say $/ if m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
# 1Match|66120368
# 2Match|66120368
# 1Match|66120512
# 2Match|66120512
# 1Match|66120656
# (「a」 「a」)
# 3List|67612672
This time the match process creates a Match
object and sticks with it for one pass, then a second match object for a second pass, and finally a third match object for a third pass before it fails to match a third 'a'
(and hence the corresponding debug 2
doesn't get called). At the end of the m.../.../
call it has created a List
object and bound that to $/
.
Next we run the first of the two :g
cases:
say m:g / {debug 1} 'a' {debug 2} /; debug 3; # (「a」 「a」)
# 1Match|66119216
# 2Match|66119216
# 1Match|66119360
# 2Match|66119360
# 1Match|66119504
# (「a」 「a」)
# 3Match|66119504
Like the x:(2)
case, we try a third time and fail. But the match process does not return a List
but instead a Match
object. And it's the one created in the third pass. (Which surprises me.)
Finally, there's the "Uhoh" case:
say $/ if m:g / {debug 1} 'a' {debug 2} /; debug 3; # 「a」 <--- Uhoh
# 1Match|66119648
# 2Match|66119648
# 1Match|66119792
# 2Match|66119792
# 「a」
# 3Match|66119792
Remarkably, the expected third pass appears not to start.
It's plausible that exploring the relevant source code would be valuable. I'll write about that here in case it's of interest to you or other readers and in case this is a bug and what I write is of interest to someone fixing it.
Afaict a code block in a regex leads to an AST node being generated here that inserts a sub-node ahead of the statements in the block that does a bind operation:
:op('bind'),
QAST::Var.new( :name('$/'), :scope('lexical') ),
QAST::Op.new(
QAST::Var.new( :name('$¢'), :scope('lexical') ),
:name('MATCH'),
:op('callmethod')
)
My read of the above is that it inserts code that binds the lexical $/
symbol to the result of a .MATCH
method call on the object bound to the lexical $¢
symbol immediately prior to running the code in the block.
The doc has a section on $¢
; I'll quote a sentence:
The main difference between
$/
and$¢
is scope: the latter only has a value inside [a] regex
I'm left wondering why $¢
exists and what other differences there are.
Moving on...
I see there's a raku level .MATCH
. But it barely does anything. So I presume the code that's relevant is here.
At this point I'll pause. I might continue further in a later edit.
If someone comes up with an answer in the next few days demonstrating that what you've shown isn't a bug, or has already been filed as a bug, then fair enough.
Otherwise, please consider doing your own search of the issue queues and/or starting a new issue in whatever issue queue you consider the most appropriate (default to /rakudo/rakudo/issues).
I've already searched the four github.com issue queues I considered plausibly relevant as part of writing this answer:
https://github.com/rakudo/rakudo/issues
https://github.com/Raku/old-issue-tracker/issues
https://github.com/perl6/nqp/issues
https://github.com/moarvm/moarvm/issues
I searched for two keywords that I hoped might uncover an existing issue ("global" and "publish"). No matching issues were relevant. Perhaps you can also look for other keywords that you think a filer might use.
If you do file an issue, please consider adding your tests, or mine or some other variant, converted into standard roast test cases, if you know how to do that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With