Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"skip" changes parser behavior

Tags:

antlr

antlr4

Adding skip to a rule doesn't do what I expect. Here's a grammar for a pair of tokens separated by a comma and a space. I made one version where the comma is marked skip, and one where it isn't:

grammar Commas;

COMMA:          ', ';
COMMASKIP:      ', ' -> skip;
DATA:           ~[, \n]+;

withoutSkip:    data COMMA data '\n';
withSkip:       data COMMASKIP data '\n';
data:           DATA;

Testing the rule without skip works as expected:

$ echo 'a, b' | grun Commas withoutSkip -tree
(withoutSkip (data a) ,  (data b) \n)

With skip gives me an error:

$ echo 'a, b' | grun Commas withSkip -tree
line 1:1 mismatched input ', ' expecting COMMASKIP
(withSkip (data a) ,  b \n)

If I comment out the COMMA and withoutSkip rules I get this:

$ echo 'a, b' | grun Commas withSkip -tree
line 1:3 missing ', ' at 'b'
(withSkip (data a) <missing ', '> (data b) \n)

I am trying to get output that just has the data tokens without the comma, like this:

(withSkip (data a) (data b) \n)

What am I doing wrong?

like image 601
Dan Lipsitt Avatar asked Feb 17 '23 09:02

Dan Lipsitt


1 Answers

skip causes the lexer to discard the token. Therefore, a skipped lexer rule cannot be used in parser rules.

Another thing, if two or more rules match the same input, the rule defined first will "win" from the rule(s) defined later in the grammar, no matter if the parser tries to match the rule defined later in the grammar, the first rule will always "win". In your case, the rule COMMASKIP will never be created since COMMA matches the same input.

Try something like this:

grammar Commas;

COMMA : ',' -> skip;
SPACE : (' '|'\n') -> skip;
DATA  : ~[, \n]+;

data  : DATA+;

EDIT

So how do I specify where the comma goes without including it in the parse tree? Your code would match a, , b.

You don't, so if the comma is significant (ie. a,,b) is invalid, it cannot be skipped from the lexer.

I think in antlr3 you're supposed to use an exclamation point.

In ANTLR 4, you cannot create an AST from your parse. In the new version, all terminals/rules are in one parse tree. You can iterate over this tree with custom visitors and/or listeners. A demo of how to do this can be found in this Q&A: Once grammar is complete, what's the best way to walk an ANTLR v4 tree?

In your case, the grammar would look like this:

grammar X;

COMMA : ',';
SPACE : (' '|'\n') -> skip;
DATA  : ~[, \n]+;

data  : DATA (COMMA DATA)*;

and then create a listener like this:

public class MyListener extends XBaseListener {

    @Override
    public void enterData(XParser.DataContext ctx) {

        List dataList = ctx.DATA(); // not sure what type of list it returns...
        // do something with `dataList`
    }
}

As you can see, the COMMA is not removed, but inside enterData(...) you just only use the DATA tokens.

like image 50
Bart Kiers Avatar answered Mar 15 '23 12:03

Bart Kiers