Parsing nested tuples using perl regexp with $^R

Question

I want to learn how to create an abstract syntax tree for nested tuples using a Perl regexp with embedded code execution. I can easily program that using a Perl 6 grammar and I'm aware that using parsing modules would simplify the task in Perl 5, but I think for such simple tasks I should be able to do it without modules by learning how to mechanically translate from grammar definitions. I couldn't find a way to dereference $^R, so I try to undo the involuntary nesting at the end of the TUPLE rule definition, but the output is incorrect, e.g. some substrings appear twice.

use v5.10;
use Data::Dumper;

while (<DATA>) {
    chomp;
    /(?&TUPLE)(?{$a = $^R})
    (?(DEFINE)
        (?<TUPLE>
            T \s (?&ELEM) \s (?&ELEM)
            (?{ [$^R->[0][0],[$^R->[0][1],$^R[1]]] })
        )
        (?<ELEM>
            (?: (a) (?{ [$^R,$^N] }) | $ (?&TUPLE) $ )
        )
    )/x;
    say Dumper $a;
}

__DATA__
T a a
T (T a a) a
T a (T a a)
T (T a a) (T a a)
T (T (T a a) a) (T a (T a a))

Expected output data structure is a nested list:

['a','a'];
['a',['a','a']];
[['a','a'],'a'];
[['a','a'],['a','a']];
[[['a','a'],'a'],['a',['a','a']]]

For reference I'll also share my working Perl 6 code:

grammar Tuple {
  token TOP { 'T ' <elem> ' ' <elem> }
  token elem { 'a' | '(' <TOP> ')'}
}
class Actions {
  method TOP($/) {make ($<elem>[0].made, $<elem>[1].made)}
  method elem($/) {make $<TOP> ?? $<TOP>.made !! 'a'}
}

amon · Accepted Answer

Trying to figure out how to use (?{ ... }) constructs is almost always not worth the effort. In particular, this can have unexpected behaviour together with backtracking. It is also very difficult to debug such regexes since the control flow tends to be non-obvious.

Instead, it tends to be easier to do write an ad-hoc recursive descent parser with m//gc-style lexing: Each Perl string stores its last match offset. When applying a regex with m/\G ... /gc in scalar context, it can anchor at the last offset and advances the offset iff the match succeeds.

Here:

use strict;
use warnings;
use Test::More;

sub parse {
  my ($str) = @_;
  pos($str) = 0;  # set match position to beginning
  return parse_tuple(\$str);
}

sub parse_tuple {
  my ($ref) = @_;
  $$ref =~ /\G T \s/gcx or die error($ref, "expected tuple start T");
  my $car = parse_element($ref);
  $$ref =~ /\G \s /gcx or die error($ref, "expected space between tuple elements");
  my $cdr = parse_element($ref);
  return [$car, $cdr];
}

sub parse_element {
  my ($ref) = @_;
  return 'a' if $$ref =~ /\G a /gcx;

  $$ref =~ /\G $ /gcx or die error($ref, "expected opening paren for nested tuple");
  my $tuple = parse_tuple($ref);
  $$ref =~ /\G $ /gcx or die error($ref, "expected closing paren after nested tuple");
  return $tuple;
}

sub error {
  my ($ref, $msg) = @_;
  my $snippet = substr $$ref, pos($$ref), 20;
  return "$msg just before '$snippet...'";
}

is_deeply parse('T a a'), ['a','a'];
is_deeply parse('T (T a a) a'), [['a','a'],'a'];
is_deeply parse('T a (T a a)'), ['a',['a','a']];
is_deeply parse('T (T a a) (T a a)'), [['a','a'],['a','a']];
is_deeply parse('T (T (T a a) a) (T a (T a a))'), [[['a','a'],'a'],['a',['a','a']]];
done_testing;

Parsing nested tuples using perl regexp with $^R

Tags:

regex

perl

rubystallion

1 Answers

amon

Recent Activity

Donate For Us

Parsing nested tuples using perl regexp with $^R

Tags:

regex

perl

rubystallion

1 Answers

amon

Related questions

Recent Activity

Donate For Us