How to manually construct an AST?

Tags:

I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception when it is not). Where do i go from here to build an AST? I found plenty of information on building my LL(1) parser, but very little on then going on to build the AST.

My current code (written in very simple Ruby, and including a lexer and a parser) is found here on github: https://gist.github.com/e9d4081b7d3409e30a57

Can someone explain how i go from what i have currently to an AST?

Alternatively, if you are unfamiliar with Ruby, but know C, could you tell me how i build an AST for the C code in the recursive descent parsing wikipedia article.

Please note, i do not want to use a parser generator like yacc or antlr to do the work for me, i want to do everything from scratch.

Thanks!

461

asked Apr 12 '12 10:04

horseyguy

1 Answers

You need to associate each symbol that you match with a callback that constructs that little part of the tree. For example, let's take a fairly common construct: nested function calls.

Click to copy

a(b())

Your terminal tokens here are something like:

L_PAREN = '('
R_PAREN = ')'
IDENTIFIER = [a-z]+

And your nonterminal symbols are something like:

FUNCTION_CALL = IDENTIFIER, L_PAREN, R_PAREN
or;
FUNCTION_CALL = IDENTIFIER, L_PAREN, FUNCTION_CALL, R_PAREN

Obviously the second alternative above for the rule FUNCTION_CALL is recursive.

You already have a parser that knows it has found a valid symbol. The bit you're missing is to attach a callback to the rule, which receives its components as inputs and returns a value (usually) representing that node in the AST.

Imagine if the first alternative from our FUNCTION_CALL rule above had a callback:

Click to copy

Proc.new do |id_tok, l_paren_tok, r_paren_tok|
  { item: :function_call, name: id_tok, args: [] }
end

That would mean that the AST resulting from matching:

Click to copy

a()

Would be:

Click to copy

{
  item: :function_call,
  name: "a",
  args: []
}

Now to extrapolate that to the more complex a(b()). Because the parser is recursive, it will recognize the b() first, the callback from which returns what we have above, but with "b" instead of "a".

Now let's define the callback attached to the rule that matches the second alternative. It's very similar, except it also deals with the argument it was passed:

Click to copy

Proc.new do |id_tok, l_paren_tok, func_call_item, r_paren_tok|
  { item: :function_call, name: id_tok, args: [ func_call_item ] }
end

Because the parser has already recognized b() and that part of the AST was returned from your callback, the resulting tree is now:

Click to copy

{
  item: :function_call,
  name: "a",
  args: [
    {
      item: :function_call,
      name: "b",
      args: []
    }
  ]
}

Hopefully this gives you some food for thought. Pass all the tokens you match into a routine that constructs very small parts of your AST.

answered Sep 19 '22 09:09

d11wtq

Related questions
                            
                                Heroku Rails Console Write to Local File
                            
                                Why does self.class === MyClass return false, while self.class == MyClass returns true?
                            
                                Mailgun for Rails Application
                            
                                Get the error line in a Ruby Opal code
                            
                                Stream the response body of an HTTP GET to an HTTP POST with Ruby
                            
                                How do I use a Rails cache to store Nokogiri objects?
                            
                                How to differentiate Rails API calls in Newrelic?
                            
                                Ruby refinements gotchas
                            
                                Node / NPM dependency with Ruby on Rails engine gem asset pipeline
                            
                                Packaging precompiled binaries inside of a gem
                            
                                Ruby BigDecimal Round: Is this an error?
                            
                                Ruby Watir: Clicking OK on JavaScript Alerts?
                            
                                How would I recompile Ripper's AST back to Ruby code?
                            
                                How do I suppress the huge stack trace after a rake TestTask failure?
                            
                                Is there an idiomatic ruby/rails way of returning the first truthy mapped value?
                            
                                big array manipulation is very slow in ruby
                            
                                How to enter unique associations only?
                            
                                Why is 'super' a keyword rather than a method in Ruby?
                            
                                How do I implement ICMP ping in Ruby using only the standard the socket library?
                            
                                Get changelogs for all gems included in a Rails project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to manually construct an AST?

Tags:

parsing

ruby

lexer

ll

abstract-syntax-tree

horseyguy

People also ask

1 Answers

d11wtq

Recent Activity

Donate For Us