I am currently building a parser by hand. It is a LL(1) parser. At the moment, it is a great recognizer: its function parse(List tokens) decides whether or not tokens is a member of the language or not.
Now, I want to build the corresponding AST for that input. However, I know how to implement it in a recursive descent way (already did it). That is, for the challenge, I implement my stack using a stack with the classical algorithm:
next <- first token of the input
stack <- START_SYMBOL
do {
top <- stack.pop()
if (top is a terminal and top == next) {
next <- next token of the input
} else if (top is a non terminal and PARSING_TABLE[top, next] exists) {
stack.push(PARSING_TABLE[top, next]);
} else {
return invalid input;
}
} while (stack is not empty);
return valid input;
where the PARSING_TABLE is the LL(1) table. However, I wonder how to implement the part which build the AST in such a configuration. I do not ask for complete implementation, more for implementation idea.
Thanks !
Your stack can be annotated so that it contains the AST entry reference (i.e. rule number + position in rule + target data where to store) + (terminal/non terminal)
Your initial stack <- START_SYMBOL
is annotated to store its result in the AST root.
Basically, your pop()
selects the current AST construct. Then the next <- next token
saves the value in your AST. The stack.push(PARSING_TABLE[top, next]);
opens a new AST list and writes it in the construct corresponding to top
, and generates in each entry of the stack the 'rule number + position + target list'.
When you parsing is finished, you have the entire tree.
In a precise AST, you might want to ignore some tokens. This can be done via appropriate annotations in the stack set during the push() part. The typical way is to attach to each of your rules (A -> B C) some meta information, for example, what is to be kept and what is the nature of the result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With