Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Once grammar is complete, what's the best way to walk an ANTLR v4 tree?

Goal

I'm working on a project to create a Varscoper for Coldfusion CFscript. Basically, this means checking through source code files to ensure that developers have properly var'd their variables.

After a couple of days of working with ANTLR V4 I have a grammar which generates a very nice parse tree in the GUI view. Now, using that tree I need a way to crawl up and down the nodes programmatically looking for variable declarations and ensure that if they are inside functions they have the proper scoping. If possible I would rather NOT do this in the grammar file as that would require mixing the definition of the language with this specific task.

What I've tried

My latest attempt was using the ParserRuleContext and attempting to go through it's children via getPayload(). After checking the class of getPayLoad() I would either have a ParserRuleContext object or a Token object. Unfortunately, using that I was never able to find a way to get the actual rule type for a specific node, only it's containing text. The rule type for each node is neccessary because it matters whether that text node is an ignored right-hand expression, a variable assignment or a function declaration.

Questions

  1. I am very new to ANTLR, is this even the right approach, or is there a better way to traverse the tree?

Here's my sample java code:

Cfscript.java

import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.Trees;  public class Cfscript {     public static void main(String[] args) throws Exception {         ANTLRInputStream input = new ANTLRFileStream(args[0]);         CfscriptLexer lexer = new CfscriptLexer(input);         CommonTokenStream tokens = new CommonTokenStream(lexer);         CfscriptParser parser = new CfscriptParser(tokens);         parser.setBuildParseTree(true);         ParserRuleContext tree = parser.component();         tree.inspect(parser); // show in gui         /*             Recursively go though tree finding function declarations and ensuring all variableDeclarations are varred             but how?         */     } } 

Cfscript.g4

grammar Cfscript;  component     : 'component' keyValue* '{' componentBody '}'     ;  componentBody     : (componentElement)*     ;  componentElement     : statement     | functionDeclaration     ;  functionDeclaration     : Identifier? Identifier? 'function' Identifier argumentsDefinition '{' functionBody '}'     ;  argumentsDefinition     : '(' argumentDefinition (',' argumentDefinition)* ')'     | '()'     ;  argumentDefinition     : Identifier? Identifier? argumentName ('=' expression)?     ;   argumentName     : Identifier     ;  functionBody     : (statement)*     ;  statement     : variableStatement     | nonVarVariableStatement     | expressionStatement     ;  variableStatement     : 'var' variableName '=' expression ';'     ;  nonVarVariableStatement     : variableName '=' expression ';'     ;  expressionStatement     : expression ';'     ;  expression     : assignmentExpression     | arrayLiteral     | objectLiteral     | StringLiteral     | incrementExpression     | decrementExpression     | 'true'      | 'false'     | Identifier     ;  incrementExpression     : variableName '++'     ;  decrementExpression     : variableName '--'     ;  assignmentExpression     : Identifier (assignmentExpressionSuffix)*     | assignmentExpression (('+'|'-'|'/'|'*') assignmentExpression)+     ;  assignmentExpressionSuffix     : '.' assignmentExpression     | ArrayIndex     | ('()' | '(' expression (',' expression)* ')' )     ;  methodCall     : Identifier ('()' | '(' expression (',' expression)* ')' )     ;  variableName     : Identifier (variableSuffix)*     ;  variableSuffix     : ArrayIndex     | '.' variableName     ;  arrayLiteral     : '[' expression (',' expression)* ']'     ;  objectLiteral     : '{' (Identifier '=' expression (',' Identifier '=' expression)*)? '}'     ;  keyValue     : Identifier '=' StringLiteral     ;  StringLiteral     :  '"' (~('\\'|'"'))* '"'     ;   ArrayIndex     : '[' [1-9] [0-9]* ']'     | '[' StringLiteral ']'     ;  Identifier     : [a-zA-Z0-9]+     ;  WS     : [ \t\r\n]+ -> skip      ;  COMMENT      : '/*' .*? '*/'  -> skip     ; 

Test.cfc (testing code file)

component something = "foo" another = "more" persistent = "true" datasource = "#application.env.dsn#" {     var method = something.foo.test1;     testing = something.foo[10];     testingagain = something.foo["this is a test"];     nuts["testing"]++;     blah.test().test3["test"]();      var math = 1 + 2 - blah.test().test4["test"];      var test = something;     var testing = somethingelse;     var testing = {          test = more,          mystuff = {              interior = test          },         third = "third key"     };     other = "Idunno homie";     methodCall(interiorMethod());      public function bar() {         var new = "somebody i used to know";         something = [1, 2, 3];     }      function nuts(required string test1 = "first", string test = "second", test3 = "third") {      }      private boolean function baz() {         var this = "something else";     } } 
like image 219
Nucleon Avatar asked Feb 24 '13 08:02

Nucleon


People also ask

What grammar does ANTLR use?

ANTLR 2 accepts three types of grammar specifications -- parsers, lexers, and tree-parsers (also called tree-walkers). Because ANTLR 2 uses LL(k) analysis for all three grammar variants, the grammar specifications are similar, and the generated lexers and parsers behave similarly.

Is ANTLR LL or LR?

In computer-based language recognition, ANTLR (pronounced antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development.

What is ANTLR v4?

ANTLR v4 is a powerful tool used for building new programming languages and processing/translating structured text or binary files. ANTLR uses a grammar you create to generate a parser which can build and traverse a parse tree (or abstract syntax tree, AST).

What is a terminal in ANTLR?

An Antlr Grammar Nonterminals in Antlr have to be lowercase: root , html , normal , italic . Terminals are either quoted strings, like '<i>' , or capitalized names, like EOF and TEXT . root : html EOF; root is the entry point of the grammar. This is the nonterminal that the whole input needs to match.


1 Answers

I wouldn't walk this manually if I were you. After generating a lexer and parser, ANTLR would also have generated a file called CfscriptBaseListener that has empty methods for all of your parser rules. You can let ANTLR walk your tree and attach a custom tree-listener in which you override only those methods/rules you're interested in.

In your case, you probably want to be notified whenever a new function is created (to create a new scope) and you'll probably be interested in variable assignments (variableStatement and nonVarVariableStatement). Your listener, let's call is VarListener will keep track of all scopes as ANTLR walks the tree.

I did change 1 rule slightly (I added objectLiteralEntry):

objectLiteral     : '{' (objectLiteralEntry (',' objectLiteralEntry)*)? '}'     ;  objectLiteralEntry     : Identifier '=' expression     ;     

which makes life easier in the following demo:

VarListener.java

public class VarListener extends CfscriptBaseListener {      private Stack<Scope> scopes;      public VarListener() {         scopes = new Stack<Scope>();         scopes.push(new Scope(null));     }       @Override     public void enterVariableStatement(CfscriptParser.VariableStatementContext ctx) {         String varName = ctx.variableName().getText();         Scope scope = scopes.peek();         scope.add(varName);     }      @Override     public void enterNonVarVariableStatement(CfscriptParser.NonVarVariableStatementContext ctx) {         String varName = ctx.variableName().getText();         checkVarName(varName);     }      @Override     public void enterObjectLiteralEntry(CfscriptParser.ObjectLiteralEntryContext ctx) {         String varName = ctx.Identifier().getText();         checkVarName(varName);     }      @Override     public void enterFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {         scopes.push(new Scope(scopes.peek()));     }      @Override     public void exitFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {         scopes.pop();             }      private void checkVarName(String varName) {         Scope scope = scopes.peek();         if(scope.inScope(varName)) {             System.out.println("OK   : " + varName);         }         else {             System.out.println("Oops : " + varName);         }     } } 

A Scope object could be as simple as:

Scope.java

class Scope extends HashSet<String> {      final Scope parent;      public Scope(Scope parent) {         this.parent = parent;     }      boolean inScope(String varName) {         if(super.contains(varName)) {             return true;         }         return parent == null ? false : parent.inScope(varName);     } } 

Now, to test this all, here's a small main class:

Main.java

import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*;  public class Main {      public static void main(String[] args) throws Exception {          CfscriptLexer lexer = new CfscriptLexer(new ANTLRFileStream("Test.cfc"));         CfscriptParser parser = new CfscriptParser(new CommonTokenStream(lexer));         ParseTree tree = parser.component();         ParseTreeWalker.DEFAULT.walk(new VarListener(), tree);     } } 

If you run this Main class, the following will be printed:

Oops : testing Oops : testingagain OK   : test Oops : mystuff Oops : interior Oops : third Oops : other Oops : something

Without a doubt, this is not exactly what you want and I probably goofed up some scoping rules of Coldfusion. But I think this will give you some insight in how to solve your problem properly. I think the code is pretty self explanatory, but if this is not the case, don't hesitate to ask for clarification.

HTH

like image 121
Bart Kiers Avatar answered Sep 19 '22 13:09

Bart Kiers