Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

antlr4 grammar - how to match EOF/NL

Tags:

antlr4

How would I match a file where the last line has no newline at the end. When I use the commented line (w/EOF) the parser goes into what looks like an infinite loop (i.e it hangs).

Here is a grammar - mostly borrowed from tparr's work

grammar csv;

prog : row+ ;
row :  field (',' field)* NL;
// row :  field (',' field)* (NL|EOF); // doesn't work

field : STR | QSTR | ; // field can be empty

STR : ~[\n,"]+ ;
QSTR : Q (QQ|~'"')* Q ;
NL : '\n';

fragment QQ : '""' ;
fragment Q : '"' ;

And here is the corresponding data file.

Details,Month,Amount
Mid Bonus,June,"$2,000"
,January,"""zippo"""
Total Bonuses,"","$5,000"<EOF is on the same line>
like image 868
RoyM Avatar asked Sep 06 '25 03:09

RoyM


1 Answers

The parser goes to an infinite loop because your rule row (the variant which includes EOF) can match an empty input string at the end:

  • field can match the empty string
  • (',' field)* can obviously match the empty string because of the * quantifier
  • the EOF token cannot be consumed (ie it occurs an infinite amount of times), so it can be matched multiple times.

Instead of thinking about a row as ending with a newline, why don't you think of the newline as a row separator instead:

prog : row (NL row)* EOF;
row  : field (',' field)*;

That's untested but should work fine.

like image 66
Lucas Trzesniewski Avatar answered Sep 08 '25 00:09

Lucas Trzesniewski