Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling Bool on a Regex does not work as documented

Tags:

regex

oop

raku

According to the documentation, Bool method of Regex class...

Matches against the caller's $_ variable, and returns True for a match or False for no match.

However, in this example

$_ = "3";
my regex decimal { \d };
say &decimal.Bool;

returns False. Also, looking at the source, it kinda makes sense what it says, since it will be matching a $!topic instance variable. Not clear, however, that this variable will effectively correspond to $_, and the example above seems to say so. Any idea of what actually happens?

like image 686
jjmerelo Avatar asked Jun 29 '19 07:06

jjmerelo


1 Answers

Short answer: the documentation was sort of accurate for 6.c, however the exact semantics were not at all so straightforward as "the caller" (and in fact, contained a risk of really odd bugs). The refined behavior is:

  • Anonymous regexes constructed with forms like /.../ and rx:i/.../ will capture the $_ and $/ at the point they are reached in the code (populating the $!topic variable mentioned in the question).
  • Bool and sink will cause a match against that captured $_, and will store the resulting Match object into that $/, provided it is writable.

Since this behavior only applies to anonymous regexes, you'd need to write:

$_ = "3";
my regex decimal { \d };
say /<&decimal>/.Bool;

Here's the long answer. The goal of the Bool-causes-matching behavior in the first place was for things like this to work:

for $file-handle.lines {
    .say if /^ \d+ ':'/;
}

Here, the for loop populates the topic variable $_, and the if provides a boolean context. The original design was that .Bool would look at the $_ of the caller. However, there were a number of problems with that. Consider:

for $file-handle.lines {
    .say if not /^ \d+ ':'/;
}

In this case, not is the caller of .Bool on the Regex. However, not would also have its own $_, which - as in any subroutine - would be initialized to Any. Thus in theory, the matching would not work. Apart from it did, because what was actually implemented was to walk through the callers until one was found with a $_ that contained a defined value! This is as bad as it sounds. Consider a case like:

sub foo() {
    $_ = some-call-that-might-return-an-undefiend-value();
    if /(\d+)/ {
        # do stuff
    }
}
$_ = 'abc123';
foo();

In the case that the call inside of foo were to return an undefiend value - perhaps unexpectedly - the matching would have continued walking the caller chain and instead found the value of $_ in the caller of foo. We could in fact have walked many levels deep in the call stack! (Aside: yes, this also meant there was complications around which $/ to update with results too!)

The previous behavior also demanded that $_ have dynamic scope - that is, to be available for callers to look up. However, a variable having dynamic scope prevents numerous analyses (both the compiler's and the programmer's ones) and thus optimizations. With many idioms using $_, this seemed undesirable (nobody wants to see Perl 6 performance guides suggesting "don't use with foo() { .bar } in hot code, use with foo() -> $x { $x.bar } instead"). Thus, 6.d changed $_ to be a regular lexical variable.

That 6.d $_ scoping change had fairly little real-world fallout, but it did cause the semantics of .Bool and .sink on Regex to be examined, since they were the one frequently used thing that relied on $_ being dynamic. That in turn shed light on the "first defined $_" behavior I just described - at which point, the use of dynamic scoping started to look more of a danger than a benefit!

The new semantics mean the programmer writing an anonymous regex can rely on it matching the $_ and updating the $/ that are visible in the scope they wrote the regex - which seems rather simpler to explain, and - in the case they end up with a $_ that isn't defined - a lot less surprising!

like image 128
Jonathan Worthington Avatar answered Nov 19 '22 08:11

Jonathan Worthington