I want to learn how to create an abstract syntax tree for nested tuples using a Perl regexp with embedded code execution. I can easily program that using a Perl 6 grammar and I'm aware that using parsing modules would simplify the task in Perl 5, but I think for such simple tasks I should be able to do it without modules by learning how to mechanically translate from grammar definitions. I couldn't find a way to dereference $^R, so I try to undo the involuntary nesting at the end of the TUPLE rule definition, but the output is incorrect, e.g. some substrings appear twice.
use v5.10;
use Data::Dumper;
while (<DATA>) {
chomp;
/(?&TUPLE)(?{$a = $^R})
(?(DEFINE)
(?<TUPLE>
T \s (?&ELEM) \s (?&ELEM)
(?{ [$^R->[0][0],[$^R->[0][1],$^R[1]]] })
)
(?<ELEM>
(?: (a) (?{ [$^R,$^N] }) | \( (?&TUPLE) \) )
)
)/x;
say Dumper $a;
}
__DATA__
T a a
T (T a a) a
T a (T a a)
T (T a a) (T a a)
T (T (T a a) a) (T a (T a a))
Expected output data structure is a nested list:
['a','a'];
['a',['a','a']];
[['a','a'],'a'];
[['a','a'],['a','a']];
[[['a','a'],'a'],['a',['a','a']]]
For reference I'll also share my working Perl 6 code:
grammar Tuple {
token TOP { 'T ' <elem> ' ' <elem> }
token elem { 'a' | '(' <TOP> ')'}
}
class Actions {
method TOP($/) {make ($<elem>[0].made, $<elem>[1].made)}
method elem($/) {make $<TOP> ?? $<TOP>.made !! 'a'}
}
Trying to figure out how to use (?{ ... })
constructs is almost always not worth the effort. In particular, this can have unexpected behaviour together with backtracking. It is also very difficult to debug such regexes since the control flow tends to be non-obvious.
Instead, it tends to be easier to do write an ad-hoc recursive descent parser with m//gc
-style lexing: Each Perl string stores its last match offset. When applying a regex with m/\G ... /gc
in scalar context, it can anchor at the last offset and advances the offset iff the match succeeds.
Here:
use strict;
use warnings;
use Test::More;
sub parse {
my ($str) = @_;
pos($str) = 0; # set match position to beginning
return parse_tuple(\$str);
}
sub parse_tuple {
my ($ref) = @_;
$$ref =~ /\G T \s/gcx or die error($ref, "expected tuple start T");
my $car = parse_element($ref);
$$ref =~ /\G \s /gcx or die error($ref, "expected space between tuple elements");
my $cdr = parse_element($ref);
return [$car, $cdr];
}
sub parse_element {
my ($ref) = @_;
return 'a' if $$ref =~ /\G a /gcx;
$$ref =~ /\G \( /gcx or die error($ref, "expected opening paren for nested tuple");
my $tuple = parse_tuple($ref);
$$ref =~ /\G \) /gcx or die error($ref, "expected closing paren after nested tuple");
return $tuple;
}
sub error {
my ($ref, $msg) = @_;
my $snippet = substr $$ref, pos($$ref), 20;
return "$msg just before '$snippet...'";
}
is_deeply parse('T a a'), ['a','a'];
is_deeply parse('T (T a a) a'), [['a','a'],'a'];
is_deeply parse('T a (T a a)'), ['a',['a','a']];
is_deeply parse('T (T a a) (T a a)'), [['a','a'],['a','a']];
is_deeply parse('T (T (T a a) a) (T a (T a a))'), [[['a','a'],'a'],['a',['a','a']]];
done_testing;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With