Basically, I need to lookahead to know if a certain token exists, but without matching it (i.e. so that the another parser rule can still match it).
The exact details of the problem is an "END-ALL" clause. The language has constructs like "IF" (closed by an "END-IF"), "FOR" (closed by an "END-FOR"), and so on.
But one can choose to globally close all such open loops with an "END-ALL" (thus removing the need for the actual "END-IF" or "END-FOR" clauses).
Is there anyway I can properly implement this?
You could do that by creating a boolean flag inside your if
(and for
-) statements that track whether an ENDALL
needs to be consumed or if a look-ahead is enough. This boolean flag is passed to the parser rule that matches an end of an if
code-block.
A little demo:
grammar T;
options {
output=AST;
}
tokens {
BLOCK;
ASSIGN;
}
@parser::members {
private boolean flag = true;
}
parse
: block EOF -> block
;
block
: stat* -> ^(BLOCK stat*)
;
stat
: PRINT expression -> ^(PRINT expression)
| assignment
| ifStat
;
assignment
: ID '=' expression -> ^(ASSIGN ID expression)
;
ifStat
@init{
boolean consumeEndAll = false;
if(flag) {
consumeEndAll = true;
flag = false;
}
}
@after {
if(consumeEndAll) {
flag = true;
}
}
: IF expression DO block end[consumeEndAll] -> ^(IF expression block)
;
expression
: NUMBER
| TRUE
| FALSE
| ID
;
end [boolean consumeEndAll]
: END
| EOF
| {consumeEndAll}?=> ENDALL
| {input.LT(1).getType() == ENDALL}?=> { /* consume no token */ }
;
PRINT : 'print';
ENDALL : 'endall';
END : 'end';
IF : 'if';
DO : 'do';
TRUE : 'true';
FALSE : 'false';
NUMBER : '0'..'9'+ ('.' '0'..'9'+)?;
ID : ('a'..'z' | 'A'..'Z')+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
The predicates inside the end
rule (the { ... }?=>
) cause the rule to either consume ENDALL
or only look ahead for a presence of such a token, but do not consume it.
More about predicates: What is a 'semantic predicate' in ANTLR?
The parser generated by the grammar above will produce identical AST's for both scripts 1 and 2:
if 1 do
print a
if 2 do
print b
print c
if 3 do
end
end
end
print d
if 1 do
print a
if 2 do
print b
print c
if 3 do
endall
print d
namely, the following AST:
(image generated using graphviz-dev.appspot.com)
You can test this all with the following Java class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source =
"if 1 do \n" +
" print a \n" +
" if 2 do \n" +
" print b \n" +
" print c \n" +
" if 3 do \n" +
"endall \n" +
"print d ";
System.out.println(source + "\n------------------\n");
TLexer lexer = new TLexer(new ANTLRStringStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With