Understanding the context data structure in Antlr4

Q: What is a context in ANTLR?

A rule context is a record of a single rule invocation. We form a stack of these context objects using the parent pointer. A parent pointer of null indicates that the current context is the bottom of the stack. The ParserRuleContext subclass as a children list so that we can turn this data structure into a tree.

Q: What is ANTLR4 used for?

ANTLR 4 allows you to define lexer and parser rules in a single combined grammar file. This makes it really easy to get started. To get familiar with working with ANTLR, let's take a look at what a simple JSON grammar would look like and break it down. grammar Json; @header { package com.

Tags:

java

parsing

visitor-pattern

antlr4

I'm trying to write a code translator in Java with the help of Antlr4 and had great success with the grammar part so far. However I'm now banging my head against a wall wrapping my mind around the parse tree data structure that I need to work on after my input has been parsed.

I'm trying to use the visitor template to go over my parse tree. I'll show you an example to illustrate the points of my confusion.

My grammar:

grammar pqlc;

// Lexer

//Schlüsselwörter
EXISTS: 'exists';
REDUCE: 'reduce';
QUERY: 'query';
INT: 'int';
DOUBLE: 'double';
CONST: 'const';
STDVECTOR: 'std::vector';
STDMAP: 'std::map';
STDSET: 'std::set';
C_EXPR: 'c_expr';

INTEGER_LITERAL  : (DIGIT)+ ;
fragment DIGIT: '0'..'9';
DOUBLE_LITERAL : DIGIT '.' DIGIT+;

LPAREN          : '(';
RPAREN          : ')';
LBRACK          : '[';
RBRACK          : ']';
DOT             : '.';
EQUAL           : '==';
LE              : '<=';
GE              : '>=';
GT              : '>';
LT              : '<';
ADD             : '+';
MUL             : '*';
AND             : '&&';
COLON           : ':';

IDENTIFIER    :   JavaLetter JavaLetterOrDigit*;
fragment JavaLetter    :   [a-zA-Z$_]; // these are the "java letters" below 0xFF
fragment JavaLetterOrDigit    :   [a-zA-Z0-9$_]; // these are the "java letters or digits" below 0xFF
WS  
    :  [ \t\r\n\u000C]+ -> skip  
    ;
COMMENT
    :   '/*' .*? '*/' -> skip
    ;

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;


// Parser

//start_rule: query;

query :
      quant_expr
      | qexpr+
      | IDENTIFIER // order IDENTIFIER and qexpr+?
      | numeral
      | c_expr //TODO

      ;

c_type : INT | DOUBLE | CONST;
bin_op: AND | ADD | MUL | EQUAL | LT | GT | LE| GE;


qexpr:
         LPAREN query RPAREN bin_op_query? 
         // query bin_op query
         | IDENTIFIER  bin_op_query? // copied from query to resolve left recursion problem
         | numeral bin_op_query?  // ^
         | quant_expr bin_op_query? // ^
           |c_expr bin_op_query?
           // query.find(query)
         | IDENTIFIER  find_query? // copied from query to resolve left recursion problem
         | numeral find_query?  // ^
         | quant_expr find_query?
           |c_expr find_query?
           // query[query]
          | IDENTIFIER  array_query? // copied from query to resolve left recursion problem
         | numeral array_query?  // ^
         | quant_expr array_query?
           |c_expr array_query?

     // | qexpr bin_op_query // bad, resolved by quexpr+ in query 
     ;

bin_op_query: bin_op query bin_op_query?; // resolve left recursion of query bin_op query

find_query: '.''find' LPAREN query RPAREN;
array_query: LBRACK query RBRACK;

quant_expr:
    quant id ':' query
          | QUERY LPAREN match RPAREN ':' query
          | REDUCE LPAREN IDENTIFIER RPAREN id ':' query
    ;

match:
         STDVECTOR LBRACK id RBRACK EQUAL cm
     | STDMAP '.''find' LPAREN cm RPAREN EQUAL cm
     | STDSET '.''find' LPAREN cm RPAREN
     ;

cm:
    IDENTIFIER
  | numeral
   | c_expr //TODO
  ;

quant :
          EXISTS;

id :
     c_type IDENTIFIER
     | IDENTIFIER // Nach Seite 2 aber nicht der Übersicht. Laut übersicht id -> aber dann wäre Regel 1 ohne +
   ;

numeral :
            INTEGER_LITERAL
        | DOUBLE_LITERAL
        ;
c_expr:
          C_EXPR
      ;

Now let's parse the following string:

double x: x >= c_expr

Visually I'll get this tree: tree

Let's say my visitor is in the visitQexpr(@NotNull pqlcParser.QexprContext ctx) routine when it hits the branch Qexpr(x bin_op_query).

My question is, how can I tell that the left children ("x" in the tree) is a terminal node, or more specifically an "IDENTIFIER"? There are no visiting rules for Terminal nodes since they aren't rules. ctx.getChild(0) has no RuleIndex. I guess I could use that to check if I'm in a terminal or not, but that still wouldn't tell me if I was in IDENTIFIER or another kind of terminal token. I need to be able to tell the difference somehow.

I had more questions but in the time it took me to write the explanation I forgot them :< Thanks in advance.

248

asked Jun 12 '14 13:06

user2323596

1 Answers

You can add labels to tokens and access them/check if they exist in the surrounding context:

id :
     c_type labelA = IDENTIFIER
     | labelB = IDENTIFIER 
   ;

You could also do this to create different visits:

id :
     c_type IDENTIFIER    #idType1 //choose more appropriate names!
     | IDENTIFIER         #idType2
   ;

This will create different visitors for the two alternatives and I suppose (i.e. have not verified) that the visitor for id will not be called.

I prefer the following approach though:

id :
        typeDef
     |  otherId
     ;
typeDef: c_type IDENTIFIER;
otherId : IDENTIFIER ;

This is a more heavily typed system. But you can very specifically visit nodes. Some rules of thumb I use:

Use | only when all alternatives are parser rules.
Wrap each Token in a parser rule (like otherId) to give them "more meaning".
It's ok to mix parser rules and tokens, if the tokens are not really important (like ;) and therefore not needed in the parse tree.

154

answered Sep 28 '22 19:09

Onur

Related questions
                            
                                Java API for running UIMA Ruta scripts
                            
                                Determine the optimal size for array with respect to the JVM's memory granularity
                            
                                Short filename causing issues with two otherwise identical Paths
                            
                                Reconnecting to Remote Akka System after Restarting the Client
                            
                                Android: Getting white screen with fragment transaction
                            
                                Specify message converters for single Spring Controller
                            
                                Android app post everyThing except posted file
                            
                                What's the difference: ConcurrentUpdateSolrServer vs HttpSolrServer vs CommonsHttpSolrServer?
                            
                                How can I make a case-insensitive pattern for java.sql.DatabaseMetaData#getTables?
                            
                                Can Jenkins custom plugin on slave log to master?
                            
                                Alternatives for instanceof when processing concrete types in Java
                            
                                How do we force Resource(controller) level transaction in Dropwizard with jdbi?
                            
                                How to mock super reference (on super class)?
                            
                                How to copy InputStream to AsynchronousFileChannel
                            
                                Lint gives "Wrong format type" when using long values in strings.xml
                            
                                How can I set a WebSphere classloader policy to PARENT_LAST for an app created in Eclipse?
                            
                                "Annotated" lambda expression instead of interface with single apply method [duplicate]
                            
                                How to use java memory histogram "jmap"
                            
                                What exactly does specifying "AES" as the algorithm in KeyGenerator.getInstance() do?
                            
                                Spring & wildfly async requests

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With