Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding trees in ANTLR

Tags:

java

antlr

I'm trying to use Antlr for some text IDE-like functions -- specifically parsing a file to identify the points for code folding, and for applying syntax highlighting.

First question - is Antlr suitable for this requirement, or is it overkill? This could be achieved using regex and/or a hand-rolled parser ... but it seems that Antlr is out there to do this work for me.

I've had a look through the ... and the excellent tutorial resource here.

I've managed to get a Java grammar built (using the standard grammar), and get everything parsed neatly into a tree. However, I'd have expected to see elements nested within the tree. In actual fact, everything is a child of the very top element.

Eg. Given:

package com.example
public class Foo {
   String myString = "Hello World"
   // etc
}

I'd have expected the tree node for Foo to be a child of the node for the package declaration. Likewise, myString would be a child of Foo.

Instead, I'm finding that Foo and myString (and everything else for that matter) are all children of package.

Here's the relevant excerpt doing the parsing:

public void init() throws Exception {
    CharStream c = new ANTLRFileStream(
            "src/com/inversion/parser/antlr/Test.code");

    Lexer lexer = new JavaLexer(c);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    JavaParser parser = new JavaParser(tokens);
    parser.setTreeAdaptor(adaptor);

    compilationUnit_return result = parser.compilationUnit();
}

static final TreeAdaptor adaptor = new CommonTreeAdaptor() {
    public Object create(Token payload) {
        if (payload != null)
        {
            System.out.println("Create " + JavaParser.tokenNames[payload.getType()] + ":  L" + payload.getLine() + ":C" + payload.getCharPositionInLine() + " " + payload.getText());
        }
        return new CommonTree(payload);
    }
};

Examining result.getTree() returns a CommonTree instance, whose children are the result of the parsing.

Expected value (perhaps incorrectly)

package com.example (4 tokens)
   |
   +-- public class Foo (3 tokens)
        |
        +--- String myString = "Hello World" (4 tokens)
        +--- Comment "// etc"

(or something similar)

Actual value (All values are children of the root node of result.getTree() )

package
com
.
example
public
class
Foo
String
myString
=
"Hello World"

Is my understanding of how this should be working correct?

I'm a complete noob at Antlr so far, and I'm finding the learning curve quite steep.

like image 624
Marty Pitt Avatar asked Nov 24 '09 14:11

Marty Pitt


2 Answers

The Java-6 grammar at the top of the file sharing section on antlr.org does not include tree building. You'll need to do two things. First, tell ANTLR you want to build an AST:

options {
    output=AST;
}

Second, you need to tell it what the tree should look like by either using the tree operators or by using the rewrite rules. See the documentation on tree construction. I usually end up doing a combination of both.

like image 78
Kaleb Pederson Avatar answered Oct 02 '22 10:10

Kaleb Pederson


To build tree, you should set output=AST. (Abstract syntax tree)

As far as I know, in an ANTLR only 1 token can be the root of a tree, so you can't get exactly what you're looking for, but you can get close.

Check out: http://www.antlr.org/wiki/display/ANTLR3/Tree+construction

like image 38
rogueg Avatar answered Oct 02 '22 10:10

rogueg