Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining state and token throws. Why?

Tags:

regex

raku

This works

sub test-string( $string )
{
    my token opening-brace { \( };
    my token closing-brace { \) };
    my token balanced-braces { 
        ( <opening-brace>+ ) <closing-brace> ** { $0.chars } 
    };

    so $string ~~ /^ <balanced-braces> $/;
}

This

sub test-string( $string )
{
    state token opening-brace { \( };
    state token closing-brace { \) };
    state token balanced-braces { 
        ( <opening-brace>+ ) <closing-brace> ** { $0.chars } 
    };

    so $string ~~ /^ <balanced-braces> $/;
}    

dies with

No such method 'opening-brace' for invocant of type 'Match'
  in regex balanced-braces at ch-2.p6 line 13
  in sub test-string at ch-2.p6 line 17
  in block <unit> at ch-2.p6 line 23

I would prefer the second version, since I believe the first version is quite inefficient when it has to set up the tokens every time the function is called. So if this were real code and not a challenge entry, I'd have to make the tokens (file) global.

Why does this even happen?

like image 987
Holli Avatar asked Jan 06 '20 16:01

Holli


1 Answers

TL;DR I like take 0. There's a workaround (see take 1) but I don't think it's worthwhile. I don't think it's inefficient with a plain my (see take 2). I think use of state with a regex/method should be rejected at compile time (see takes 3 and 5) or left as is (see take 4). Unless you're a coding genius willing to persuade jnthn that Rakudo should embark on a dramatic increased exposure to continuations (see take 5).

Why does this even happen? (take 1)

"This" doesn't if you write like so:

sub test-string( $string )
{
    state &opening-brace = token { \( }
    state &closing-brace = token { \) }
    state &balanced-braces = token { 
        ( <&opening-brace>+ ) <&closing-brace> ** { $0.chars } 
    }

    so $string ~~ /^ <&balanced-braces> $/;
}   

(The need for the & in the regex calls slightly surprises me.1)

Why does this even happen? (take 2)

Why does what happen?

I believe the first version is quite inefficient when it has to set up the tokens every time the function is called.

What do you mean by "believe" and "quite inefficient" and "set up the tokens"? I would expect the regex code to be compiled just once (I'd be shocked if it were compiled each time) but haven't profiled to verify.

Which leads me to a series of questions:

Is your concern purely the time taken to recreate the 3 lexpad entries (&opening-parens etc.; more generally, number of regexes) each time the test-string function is called?

Have you actually profiled running your original code and seen a significant problem?

Have you truly measured this and found it to be part of your "critical 3%" in an actual project?

Why does this even happen? (take 3)

The state declarator does a reasonable thing with subs -- it produces a compile-time error:

state sub foo {}    # Compile time error: "Cannot use 'state' with sub declaration"
state my sub foo {} # Compile time error: "Type 'my' is not declared"

But with a method (which is what a regex is under the covers) it compiles but does nothing useful:

state method foo {} # Compiles, but I failed to find a way to access `foo`
state regex bar {.}  # Same

I've looked in Rakudo's GH issues queue and failed to find an issue discussing anything like the last two lines of code above (which are essentially the same as your token case). Perhaps folk haven't noticed this or at least didn't feel it would be helpful to file a bug?

Why does this even happen? (take 4)

So you would post an SO documenting that state regex should be rejected at compile-time or do something useful. And @Scimon++ would document another way to look at things. And me some more.

Why does this even happen? (take 5)

<Your Compiler Code Goes Here>

Because Raku is our MMORPG. If you would prefer to see the state declarator do something useful when used with a routine declaration (presumably it should either produce a compile-time error, like it currently does with a sub, or do some fancy continuation thing within the constraint of the "scoped continuations" atop which Raku is built), then that work is plausibly just a "smop" away given that the Rakudo compiler is mostly written in Raku. Someone has deliberately made state on a sub a compile-time error, and the continuation notion would be a truly colossal project, so I think the appropriate thing, if any, in the next few years, would be to make state on a method or rule also a compile-time error.

Or, perhaps more appropriately still, now this is covered by an SO, with a documented alternative (a grammar) and workaround (take 1), it's just time to move on to the next level...

Footnotes

1 See my answer to Difference in ... regex scope. The behavior of the regexes declared with state appear not to be following a straight-forward reading of the design speculation I quoted in that answer. And at least the following bit of my narrative from that answer is wrong too...

"<bar> is as explained above. It preferentially resolves to an early bound lexical (my/our) routine/rule named &bar.

...because in the take 1 code of this answer the regex calls have to be prefixed with an & to work. Maybe it's pure accident they work at all.

like image 197
raiph Avatar answered Feb 15 '23 17:02

raiph