Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Antlr to get identifiers and function names

Tags:

java

c

antlr

I'm trying to use and understand AntLR, this is new to me. My purpose is to read a source code file written in C and extract from it the identifiers (variables and function names).

In my C grammar (file C.g4) consider:

identifierList
    :   Identifier
    |   identifierList Comma Identifier
    ;
Identifier
    :   IdentifierNondigit
        (   IdentifierNondigit
        |   Digit
        )*
    ;

After generation of parser and listener I create my own listener to the identifierList.

Note that MyCListener class extends CBaseListener:

public class MyCListener extends CBaseListener {


@Override
public void enterIdentifierList(CParser.IdentifierListContext ctx) {
    List<ParseTree> children = ctx.children;
    for (ParseTree parseTree : children) {
        System.out.println(parseTree.getText());
    }

}

Then I have this in main class:

 String fileurl = "C:/example.c";

 CLexer lexer;
 try {
       lexer = new CLexer(new ANTLRFileStream(fileurl));
       CommonTokenStream tokens = new CommonTokenStream(lexer);
       CParser parser = new CParser(tokens);

       CParser.IdentifierListContext identifierContext = parser.identifierList();
       ParseTreeWalker walker = new ParseTreeWalker();
       MyCListener listener = new MyCListener();
       walker.walk(listener, identifierContext);

 } catch (IOException ex) {
       Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
 }

Where example.c is:

int main() {

// this is C

 int i=0; // i is int
 /* double j=0.0;
    C
 */
}

What am I doing wrong? Maybe I didn't write MyCListener properly, or identifierList is not what I need to listen... Really don't know. I'm sorry, but I didn't even understand my output, why is there a lexical error?:

line 3:4 mismatched input '(' expecting {<EOF>, ','}
main
(
)
{
int
i
=
0
;
}

As you see, I'm very confused about this. Can somebody help me ? Please...

like image 591
MariaH Avatar asked Oct 22 '25 05:10

MariaH


1 Answers

With this line:

CParser.IdentifierListContext identifierContext = parser.identifierList();

you're trying to parse your entire input as an identifierList. But your input isn't just that.

Assuming you're using the C.g4 from the ANTLR4 Github repository, try to let the parser start at the entry point of the grammar (which is the rule compilationUnit):

MyCListener listener = new MyCListener();
ParseTreeWalker.DEFAULT.walk(listener, parser.compilationUnit());

EDIT

Here's a quick demo:

public class Main {

    public static void main(String[] args) throws Exception {

        final List<String> identifiers = new ArrayList<String>();

        String source = "int main() {\n" +
                "\n" +
                "// this is C\n" +
                "\n" +
                " int i=0; // i is int\n" +
                " /* double j=0.0;\n" +
                "    C\n" +
                " */\n" +
                "}";

        CLexer lexer = new CLexer(new ANTLRInputStream(source));
        CParser parser = new CParser(new CommonTokenStream(lexer));

        ParseTreeWalker.DEFAULT.walk(new CBaseListener(){

            @Override
            public void enterDirectDeclarator(@NotNull CParser.DirectDeclaratorContext ctx) {
                if (ctx.Identifier() != null) {
                    identifiers.add(ctx.Identifier().getText());
                }
            }

            // Perhaps override other rules that use `Identifier`

        }, parser.compilationUnit());

        System.out.println("identifiers -> " + identifiers);
    }
}

which would print:

identifiers -> [main, i]
like image 190
Bart Kiers Avatar answered Oct 23 '25 22:10

Bart Kiers



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!