Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Treetop grammar infinite loop

I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.

My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.

class {
  initialize = lambda (name) {
    receiver.name = name
  }

  greet = lambda {
    IO.puts("Hello, #{receiver.name}!")
  }
}.new(:World).greet()
like image 598
ravinggenius Avatar asked May 24 '11 00:05

ravinggenius


1 Answers

I asked treetop to compile your language into an .rb file. That gave me something to dig into:

$ tt -o /tmp/rip.rb /tmp/rip.treetop

Then I used this little stub to recreate the loop:

require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')

This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.

To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:

def _nt_root
  p [__LINE__, '_nt_root'] #DEBUG
  start_index = index

Now we can see the scope of the loop:

[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...

Further debugging from there reveals that an integer is allowed to be an empty string:

rule integer
   digit*
end

This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:

/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
        from /tmp/rip.rb:825:in `_nt_literal_object'
        from /tmp/rip.rb:787:in `_nt_object'
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
         ... 3283 levels...

Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.

Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.

like image 178
Wayne Conrad Avatar answered Nov 07 '22 06:11

Wayne Conrad