Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Infinite recursion in ANTLR grammar

I writing a simple grammar to recognize some expressions. Here, I'm posting a simpler version of it, that I wrote just to simplify my explanation. This simpler version can recognize expressions like:

  1. this is a text
  2. [n]this is another text[/n]
  3. [n][n]this is a compound expression[/n][/n]

My problem is when I sumbmit a expression like: [i]this should generate just a recognition exception[/n]

A recognition exception is thrown, but the parser enters in a infinte recursion, because it matches '[', but when it not matches 'i' it loses itself. I think that is happening because my text component of the grammar can not contain square brackets. So, I'm posting the grammar.

grammar ErrorTest;

expression
    :    rawText EOF
    |    command EOF
    ;

rawText
    :    word+
    ;

word
    :    ESPACE* TEXT ESPACE*
    ;

command 
    :    simpleCommand
    |    compoundCommand
    ;

simpleCommand
    :    HELP
    ;

compoundCommand
    :    rawText
    |    BEGIN compoundCommand END
    ;

HELP   : '[help]';

BEGIN  : '[n]';
END    : '[/n]';

ESPACE : ' ';
TEXT   : ~(' '|'['|']')*;

How can I solve it?

like image 697
davidbuzatto Avatar asked Apr 25 '12 23:04

davidbuzatto


1 Answers

word matches the empty string because in

word
    :    ESPACE* TEXT ESPACE*
    ;

TEXT matches the empty string which causes

rawText
    :    word+
    ;

to loop infinitely.

Change

TEXT   : ~(' '|'['|']')*;

to

TEXT   : ~(' '|'['|']')+;

which will make your grammar only finitely ambiguous.

The way to think about this is that rawText can match the empty string in many ways

  1. Zero TEXT tokens
  2. One TEXT token with length 0.
  3. Two TEXT tokens with length 0.
  4. Three TEXT tokens with length 0.
  5. ...

This manifests when you have a syntactic error ([i]) because it tries each of these alternatives to see if any of them resolve the error.


To get rid of any quadratic behavior, you should really make it completely unambiguous.

rawText : ign (word (ign word)*)? ign;
ign     : ESPACE*;
word    : TEXT;

The problem with the naive fix is that rawText can match "foo" in several ways:

  1. TEXT("foo")
  2. TEXT("fo"), ESPACE(""), TEXT("o")
  3. TEXT("f"), ESPACE(""), TEXT("oo")
  4. TEXT("f"), ESPACE(""), TEXT("o"), ESPACE(""), TEXT("o")
like image 55
Mike Samuel Avatar answered Oct 16 '22 08:10

Mike Samuel