Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Antlr4 how to build a grammar allowed keywords as identifier

This is a demo code

label:
var id
let id = 10
goto label

If allowed keyword as identifier will be

let:
var var
let var = 10
goto let

This is totally legal code. But it seems very hard to do this in antlr.

AFAIK, If antlr match a token let, will never fallback to id token. so for antlr it will see

LET_TOKEN :
VAR_TOKEN <missing ID_TOKEN>VAR_TOKEN
LET_TOKEN <missing ID_TOKEN>VAR_TOKEN = 10

although antlr allowed predicate, I have to control ever token match and problematic. grammar become this

grammar Demo;
options {
  language = Go;
}
@parser::members{
    var _need = map[string]bool{}
    func skip(name string,v bool){
        _need[name] = !v
        fmt.Println("SKIP",name,v)
    }
    func need(name string)bool{
        fmt.Println("NEED",name,_need[name])
        return _need[name]
    }
}

proj@init{skip("inst",false)}: (line? NL)* EOF;
line
    : VAR ID
    | LET ID EQ? Integer
    ;

NL: '\n';
VAR: {need("inst")}? 'var' {skip("inst",true)};
LET: {need("inst")}? 'let' {skip("inst",true)};
EQ: '=';

ID: ([a-zA-Z] [a-zA-Z0-9]*);
Integer: [0-9]+;

WS: [ \t] -> skip;

Looks so terrible.

But this is easy in peg, test this in pegjs

Expression = (Line? _ '\n')* ;

Line
  = 'var' _ ID
  / 'let' _ ID _ "=" _ Integer

Integer "integer"
  = [0-9]+ { return parseInt(text(), 10); }

ID = [a-zA-Z] [a-zA-Z0-9]*

_ "whitespace"
  = [ \t]*

I actually done this in peggo and javacc.

My question is how to handle these grammars in antlr4.6, I was so excited about the antlr4.6 go target, but seems I choose the wrong tool for my grammar ?

like image 264
wener Avatar asked Jan 02 '17 05:01

wener


1 Answers

The simplest way is to define a parser rule for identifiers:

id: ID | VAR | LET;

VAR: 'var';
LET: 'let';
ID: [a-zA-Z] [a-zA-Z0-9]*;

And then use id instead of ID in your parser rules.

A different way is to use ID for identifiers and keywords, and use predicates for disambiguation. But it's less readable, so I'd use the first way instead.

like image 184
Lucas Trzesniewski Avatar answered Sep 27 '22 21:09

Lucas Trzesniewski