Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement JJTree on grammar

I have an assignment to use JavaCC to make a Top-Down Parser with Semantic Analysis for a language supplied by the lecturer. I have the production rules written out and no errors. I'm completely stuck on how to use JJTree for my code and my hours of scouring the internet for tutorials hasn't gotten me anywhere. Just wondering could anyone take some time out to explain how to implement JJTree in the code? Or if there's a hidden step-by-step tutorial out there somewhere that would be a great help!

Here are some of my production rules in case they help. Thanks in advance!

void program() : {}
{
  (decl())* (function())* main_prog()
}

void decl() #void : {}
{
  (
    var_decl() | const_decl()
   )
}

void var_decl() #void : {}
{
  <VAR> ident_list() <COLON> type()
 (<COMMA> ident_list() <COLON> type())* <SEMIC>
}

void const_decl()  #void : {}
{
  <CONSTANT> identifier() <COLON> type() <EQUAL> expression()
 ( <COMMA> identifier() <COLON> type() <EQUAL > expression())* <SEMIC>
} 

void function() #void : {}
{
  type() identifier() <LBR> param_list() <RBR>
  <CBL>
  (decl())*
  (statement() <SEMIC> )*
  returnRule() (expression() | {} )<SEMIC>
  <CBR>
}
like image 328
Aimee Jones Avatar asked Dec 16 '12 14:12

Aimee Jones


People also ask

What is JJTree?

JJTree defines a Java interface Node that all parse tree nodes must implement. The interface provides methods for operations such as setting the parent of the node, and for adding children and retrieving them. JJTree operates in one of two modes, simple and multi (for want of better terms).

How do I use lookahead in JavaCC?

You can set a global LOOKAHEAD specification by using the option "LOOKAHEAD" either from the command line, or at the beginning of the grammar file in the options section. The value of this option is an integer which is the number of tokens to look ahead when making choice decisions.

How do parser generators work?

A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar. The generated code is a parser, which takes a sequence of characters and tries to match the sequence against the grammar.


2 Answers

Creating an AST using JavaCC looks a lot like creating a "normal" parser (defined in a jj file). If you already have a working grammar, it's (relatively) easy :)

Here are the steps needed to create an AST:

  1. rename your jj grammar file to jjt
  2. decorate it with root-labels (the italic words are my own terminology...)
  3. invoke jjtree on your jjt grammar, which will generate a jj file for you
  4. invoke javacc on your generated jj grammar
  5. compile the generated java source files
  6. test it

Here's a quick step-by-step tutorial, assuming you're using MacOS or *nix, have the javacc.jar file in the same directory as your grammar file(s) and java and javac are on your system's PATH:

1

Assuming your jj grammar file is called TestParser.jj, rename it:

mv TestParser.jj TestParser.jjt

2

Now the tricky part: decorating your grammar so that the proper AST structure is created. You decorate an AST (or node, or production rule (all the same)) by adding a # followed by an identifier after it (and before the :). In your original question, you have a lot of #void in different productions, meaning you're creating the same type of AST's for different production rules: this is not what you want.

If you don't decorate your production, the name of the production is used as the type of the node (so, you can remove the #void):

void decl() :
{}
{
     var_decl()
  |  const_decl()
}

Now the rule simply returns whatever AST the rule var_decl() or const_decl() returned.

Let's now have a look at the (simplified) var_decl rule:

void var_decl() #VAR :
{}
{
  <VAR> id() <COL> id() <EQ> expr() <SCOL>
}

void id() #ID :
{}
{
  <ID>
}

void expr() #EXPR :
{}
{
  <ID>
}

which I decorated with the #VAR type. This now means that this rule will return the following tree structure:

    VAR 
   / | \
  /  |  \
ID  ID  EXPR

As you can see, the terminals are discarded from the AST! This also means that the id and expr rules loose the text their <ID> terminal matched. Of course, this is not what you want. For the rules that need to keep the inner text the terminal matched, you need to explicitly set the .value of the tree to the .image of the matched terminal:

void id() #ID :
{Token t;}
{
  t=<ID> {jjtThis.value = t.image;}
}

void expr() #EXPR :
{Token t;}
{
  t=<ID> {jjtThis.value = t.image;}
}

causing the input "var x : int = i;" to look like this:

       VAR 
        |
    .---+------.
   /    |       \
  /     |        \
ID["x"] ID["int"] EXPR["i"]

This is how you create a proper structure for your AST. Below follows a small grammar that is a very simple version of your own grammar including a small main method to test it all:

// TestParser.jjt
PARSER_BEGIN(TestParser)

public class TestParser {
  public static void main(String[] args) throws ParseException {
    TestParser parser = new TestParser(new java.io.StringReader(args[0]));
    SimpleNode root = parser.program();
    root.dump("");
  }
}

PARSER_END(TestParser)

TOKEN :
{
   < OPAR  : "(" > 
 | < CPAR  : ")" >
 | < OBR   : "{" >
 | < CBR   : "}" >
 | < COL   : ":" >
 | < SCOL  : ";" >
 | < COMMA : "," >
 | < VAR   : "var" >
 | < EQ    : "=" > 
 | < CONST : "const" >
 | < ID    : ("_" | <LETTER>) ("_" | <ALPHANUM>)* >
}

TOKEN :
{
   < #DIGIT    : ["0"-"9"] >
 | < #LETTER   : ["a"-"z","A"-"Z"] >
 | < #ALPHANUM : <LETTER> | <DIGIT> >
}

SKIP : { " " | "\t" | "\r" | "\n" }

SimpleNode program() #PROGRAM :
{}
{
  (decl())* (function())* <EOF> {return jjtThis;}
}

void decl() :
{}
{
     var_decl()
  |  const_decl()
}

void var_decl() #VAR :
{}
{
  <VAR> id() <COL> id() <EQ> expr() <SCOL>
}

void const_decl() #CONST :
{}
{
  <CONST> id() <COL> id() <EQ> expr() <SCOL>
}


void function() #FUNCTION :
{}
{
  type() id() <OPAR> params() <CPAR> <OBR> /* ... */ <CBR>
}

void type() #TYPE :
{Token t;}
{
  t=<ID> {jjtThis.value = t.image;}
}

void id() #ID :
{Token t;}
{
  t=<ID> {jjtThis.value = t.image;}
}

void params() #PARAMS :
{}
{
  (param() (<COMMA> param())*)?
}

void param() #PARAM :
{Token t;}
{
  t=<ID> {jjtThis.value = t.image;}
}

void expr() #EXPR :
{Token t;}
{
  t=<ID> {jjtThis.value = t.image;}
}

3

Let the jjtree class (included in javacc.jar) create a jj file for you:

java -cp javacc.jar jjtree TestParser.jjt

4

The previous step has created the file TestParser.jj (if everything went okay). Let javacc (also present in javacc.jar) process it:

java -cp javacc.jar javacc TestParser.jj

5

To compile all source files, do:

javac -cp .:javacc.jar *.java

(on Windows, do: javac -cp .;javacc.jar *.java)

6

The moment of truth has arrived: let's see if everything actually works! To let the parser process the input:

var n : int = I; 

const x : bool = B; 

double f(a,b,c) 
{ 
}

execute the following:

java -cp . TestParser "var n : int = I; const x : bool = B; double f(a,b,c) { }"

and you should see the following being printed on your console:

PROGRAM
 decl
  VAR
   ID
   ID
   EXPR
 decl
  CONST
   ID
   ID
   EXPR
 FUNCTION
  TYPE
  ID
  PARAMS
   PARAM
   PARAM
   PARAM

Note that you don't see the text the ID's matched, but believe me, they're there. The method dump() simply does not show it.

HTH

EDIT

For a working grammar including expressions, you could have a look at the following expression evaluator of mine: https://github.com/bkiers/Curta (the grammar is in src/grammar). You might want to have a look at how to create root-nodes in case of binary expressions.

like image 118
Bart Kiers Avatar answered Nov 01 '22 21:11

Bart Kiers


Here is an example that uses JJTree http://anandsekar.github.io/writing-an-interpretter-using-javacc/

like image 32
Anand Rajasekar Avatar answered Nov 01 '22 21:11

Anand Rajasekar