Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this perl6 grammar broken, or is in exposing a bug?

The following is a test case based on a bigger grammar--the goal is to parse the subset of YAML which is used in Unity3D asset files. The interesting function is the keyed-array matcher. This matcher loops, matching data[i]: val as <array-name(index)><indexer-and-value(index, name)>. <array-name> is overloaded, so the first time it's called, it will match any name. Subsequent iterations—when the index is nonzero—will only match the same name that was seen.

The crux of the issue is that when index>0, there should always be a known name for the array, and it should be passed to the matcher as a parameter. It's not—the interpreter is giving the following error:

Cannot resolve caller array-name(Match.new(...): 1, Nil, 1); none of these signatures match:
    (Prefab $: Int $ where { ... }, $prevName, Int $indent, *%_)
    (Prefab $: Int $idx, Match $ (@ (Any $prevName, *@)), Int $indent, *%_)
    (Prefab $: Int $idx, @ (Any $prevName, *@), Int $indent, *%_)

So the index is 1 but there was no previously matched name. That parameter is Nil, which does not make sense. Note the commented out block in that function: #{ }. If this is uncommented, the test case stops failing. There is no branching based on longest match (| operator or proto matchers), so adding extra stuff in the matcher should not change the parse.

The test input is included in the test case. Here it is:

#use Grammar::Tracer;
#use Grammar::Debugger;

grammar Prefab {
    token TOP {
        <key> ':' <value=hash-multiline(1)> \n
    }

    token key { \w+ }

    token kvpair(Int $indent=0) {
        [
        || <key> ':'  <hash-multiline($indent+1)>
        || <keyed-array($indent)>
        || <key> ': ' (\w+)
        ]
    }

    token keyed-array(Int $indent) {
        # Keys are built in to the list:
        # look for arrayname[0] first, then match subsequent lines more strictly, based on name[idx]
        :my $idx = 0;
        [
            <array-name($idx, $<array-name>, $indent)>
            <indexer-and-value($idx++, $indent)>
            #{ } # XXX this fixes it, somehow
        ] +% \n

    }
    multi token array-name(0, $prevName, Int $indent) {
        # the first element doesn't need to match indentation
        \w+
    }

    multi token array-name(Int $idx, Match $ ([$prevName, *@]), Int $indent) {
        <.indent($indent)>
        $prevName
    }
    # todo: Can I remove this overload? In testing, the parameter was sometimes an array, sometimes a Match
    multi token array-name(Int $idx, [$prevName, *@], Int $indent) {
        <.indent($indent)>
        $prevName
    }

    # arr[2]: foo
    #    ^^^^^^^^ match this
    token indexer-and-value(Int $idx, Int $indent) {
        '[' ~ ']' $idx
        [
        || ':'  <hash-multiline($indent+1)>
        || ': ' \w+
        ]
    }


    token hash-multiline(Int $indent=0) {
        # Note: the hash does not need a newline if it's over after the first (inline) kv-pair!
        # optional first line which is on the same line as the previous text:
        [
        || [<kvpair($indent)>]  [ \n <.indent($indent)> <kvpair($indent)> ]*
        ||                      [ \n <.indent($indent)> <kvpair($indent)> ]+
        ]
    }

    multi token indent(0) {
        ^^ <?>
    }
    multi token indent(Int $level) {
        ^^ ' ' ** {2*$level}
    }
}

sub MAIN() {
    say so Prefab.parse($*kv-list);
}

my $*kv-list = q:to/END/;
Renderer:
  m_Color[0]: red
END
like image 649
piojo Avatar asked Sep 09 '17 12:09

piojo


1 Answers

timotimo explained the problem on IRC­– the match variables ($/, $0, $1, and named matches) aren't global. When a matcher begins, the match variables are already populated. They mostly* aren't updated within the rest of the matcher body at all, due to performance concerns. However, when a code block is seen (even an empty block), match variables are updated. So the "bug" workaround is actually a valid solution—include an empty block to force match variables to update.

* $0 seems to be updated and available immediately. Probably the other numbered matches as well.

UPDATE: it seems like the only time match variables aren't immediately available is when you use them in a code-like context without using a block, such as in the argument list to a different matcher. Here, the match variable is immediately available after the previous match:

my regex word { \w+ };
say 'hellohello' ~~ /<word> $<word>/

But this example used as a parameter fails:

my regex repeated($x) { [$x]+ };
say 'ooxoo' ~~ / ^ <repeated('o')> . <repeated($<repeated>)> $ /

Unless you add a block to force named match variables to be updated:

my regex repeated($x) { [$x]+ };
say 'ooxoo' ~~ / ^ <repeated('o')> . {} <repeated($<repeated>)> $ /
like image 62
piojo Avatar answered Nov 11 '22 01:11

piojo