Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javacc parser option LOOKAHEAD, Java

Tags:

java

javacc

I've recently started to play around with grammatical analyzers using javacc and one of the fields is the options one...I have a code like the folowing:

options
{
  LOOKAHEAD=1;
}
PARSER_BEGIN(Calculator)

public class Calculator
{
 ...
}
PARSER_END(Calculator)

What exactly does it mean the LOOKAHEAD option? Thanks

like image 596
out_sider Avatar asked Feb 20 '10 16:02

out_sider


2 Answers

JavaCC creates recursive descent parsers. This type of parser works by looking at the next symbol to decide which rule to choose. By default it only looks at the next symbol (lookahead=1). But you can configure the parser to look not only at the next, but also the next N symbols. If you set lookahead to 2, the generated parser will look at the next two symbols to decide which rule to choose. This way, you can define your grammar more natural, but at the cost of performance. The bigger the lookahead, the more the parser will have to do.

If you set the general lookahead to a bigger number, your parser will be slower for all inputs (for non trivial grammars). You can use lookahead locally if you want to let the parser with lookahead=1 by default and use a bigger lookahead only in specific situations.

http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm#tth_sEc4.5

For example, a parser with lookahead=1 can't decide which of the rules (1 or 2) to take, but with lookahead=2 it can:

void rule0() : {} { 
  <ID> rule1() 
| <ID> rule2()
}

You can change the definition of the grammar to get the same result but use lookahead=1:

void rule0() : {} { 
  <ID> ( rule1() | rule2() )
}
like image 167
Arne Deutsch Avatar answered Nov 08 '22 21:11

Arne Deutsch


See http://en.wikipedia.org/wiki/Lookahead#Lookahead_in_parsing

Normally, the parser only looks at the next token to determine what production rule to apply. However, in some cases that is not enough to make the choice. For example, given two production rules:

p0: foo -> identifier "=" expr
p1: bar -> identifier "(" arglist ")"

If the next token is of type identifier then the parser cannot determine if it should use the foo or bar production. JavaCC will then give an error, saying it needs to use more look-ahead. Changing the look-ahead to 2 means the parser is allowed to look at the next two tokens, which in this case is sufficient to choose between the productions.

As Steve pointed out, this is in the javacc docs: https://javacc.org/tutorials/lookahead

like image 33
Hans W Avatar answered Nov 08 '22 20:11

Hans W