Adding skip
to a rule doesn't do what I expect. Here's a grammar for a pair of tokens separated by a comma and a space. I made one version where the comma is marked skip
, and one where it isn't:
grammar Commas;
COMMA: ', ';
COMMASKIP: ', ' -> skip;
DATA: ~[, \n]+;
withoutSkip: data COMMA data '\n';
withSkip: data COMMASKIP data '\n';
data: DATA;
Testing the rule without skip
works as expected:
$ echo 'a, b' | grun Commas withoutSkip -tree
(withoutSkip (data a) , (data b) \n)
With skip
gives me an error:
$ echo 'a, b' | grun Commas withSkip -tree
line 1:1 mismatched input ', ' expecting COMMASKIP
(withSkip (data a) , b \n)
If I comment out the COMMA
and withoutSkip
rules I get this:
$ echo 'a, b' | grun Commas withSkip -tree
line 1:3 missing ', ' at 'b'
(withSkip (data a) <missing ', '> (data b) \n)
I am trying to get output that just has the data tokens without the comma, like this:
(withSkip (data a) (data b) \n)
What am I doing wrong?
skip
causes the lexer to discard the token. Therefore, a skip
ped lexer rule cannot be used in parser rules.
Another thing, if two or more rules match the same input, the rule defined first will "win" from the rule(s) defined later in the grammar, no matter if the parser tries to match the rule defined later in the grammar, the first rule will always "win". In your case, the rule COMMASKIP
will never be created since COMMA
matches the same input.
Try something like this:
grammar Commas;
COMMA : ',' -> skip;
SPACE : (' '|'\n') -> skip;
DATA : ~[, \n]+;
data : DATA+;
So how do I specify where the comma goes without including it in the parse tree? Your code would match a, , b.
You don't, so if the comma is significant (ie. a,,b
) is invalid, it cannot be skipped from the lexer.
I think in antlr3 you're supposed to use an exclamation point.
In ANTLR 4, you cannot create an AST from your parse. In the new version, all terminals/rules are in one parse tree. You can iterate over this tree with custom visitors and/or listeners. A demo of how to do this can be found in this Q&A: Once grammar is complete, what's the best way to walk an ANTLR v4 tree?
In your case, the grammar would look like this:
grammar X;
COMMA : ',';
SPACE : (' '|'\n') -> skip;
DATA : ~[, \n]+;
data : DATA (COMMA DATA)*;
and then create a listener like this:
public class MyListener extends XBaseListener {
@Override
public void enterData(XParser.DataContext ctx) {
List dataList = ctx.DATA(); // not sure what type of list it returns...
// do something with `dataList`
}
}
As you can see, the COMMA
is not removed, but inside enterData(...)
you just only use the DATA
tokens.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With