Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In ANTLR, can I look-ahead for specific tokens without actually matching them?

Tags:

antlr

Basically, I need to lookahead to know if a certain token exists, but without matching it (i.e. so that the another parser rule can still match it).

The exact details of the problem is an "END-ALL" clause. The language has constructs like "IF" (closed by an "END-IF"), "FOR" (closed by an "END-FOR"), and so on.

But one can choose to globally close all such open loops with an "END-ALL" (thus removing the need for the actual "END-IF" or "END-FOR" clauses).

Is there anyway I can properly implement this?

like image 956
bundat Avatar asked Jun 08 '11 04:06

bundat


1 Answers

You could do that by creating a boolean flag inside your if (and for-) statements that track whether an ENDALL needs to be consumed or if a look-ahead is enough. This boolean flag is passed to the parser rule that matches an end of an if code-block.

A little demo:

grammar T;

options {
  output=AST;
}

tokens {
  BLOCK;
  ASSIGN;
}

@parser::members {
  private boolean flag = true;
}

parse
  :  block EOF -> block
  ;

block
  :  stat* -> ^(BLOCK stat*)
  ;

stat
  :  PRINT expression -> ^(PRINT expression)
  |  assignment
  |  ifStat
  ;

assignment
  :  ID '=' expression -> ^(ASSIGN ID expression)
  ;

ifStat
@init{
  boolean consumeEndAll = false;
  if(flag) {
    consumeEndAll = true;
    flag = false;
  }
}
@after {
  if(consumeEndAll) {
    flag = true;
  }
}
  :  IF expression DO block end[consumeEndAll] -> ^(IF expression block)
  ;

expression
  :  NUMBER
  |  TRUE
  |  FALSE
  |  ID
  ;

end [boolean consumeEndAll]
  :                                       END
  |                                       EOF
  |  {consumeEndAll}?=>                   ENDALL
  |  {input.LT(1).getType() == ENDALL}?=> { /* consume no token */ }
  ;

PRINT  : 'print';  
ENDALL : 'endall';
END    : 'end';
IF     : 'if';
DO     : 'do';
TRUE   : 'true';
FALSE  : 'false';
NUMBER : '0'..'9'+ ('.' '0'..'9'+)?;
ID     : ('a'..'z' | 'A'..'Z')+;
SPACE  : (' ' | '\t' | '\r' | '\n') {skip();};

The predicates inside the end rule (the { ... }?=>) cause the rule to either consume ENDALL or only look ahead for a presence of such a token, but do not consume it.

More about predicates: What is a 'semantic predicate' in ANTLR?

The parser generated by the grammar above will produce identical AST's for both scripts 1 and 2:

Script 1

if 1 do
  print a
  if 2 do
    print b
    print c
    if 3 do
    end
  end
end
print d

Script 2

if 1 do
  print a
  if 2 do
    print b
    print c
    if 3 do
endall
print d

namely, the following AST:

enter image description here

(image generated using graphviz-dev.appspot.com)

You can test this all with the following Java class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source =
        "if 1 do        \n" +
        "  print a      \n" +
        "  if 2 do      \n" +
        "    print b    \n" +
        "    print c    \n" +
        "    if 3 do    \n" +
        "endall         \n" +
        "print d          ";
    System.out.println(source + "\n------------------\n");
    TLexer lexer = new TLexer(new ANTLRStringStream(source));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}
like image 183
Bart Kiers Avatar answered Sep 28 '22 10:09

Bart Kiers