Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing single-line C-style comments with Antlr

Tags:

java

antlr

I wrote a grammar for a small language which understands C-style single line comments, eg.

  // this is a comment

Here is a fragment of the grammar I wrote for this language, using antlr v3.0.1

  SINGLELINE_COMMENT
:   '/' '/' (options {greedy=false;} : ~('\r' | '\n'))* ('\r' | '\n' )+ {$channel=HIDDEN;};

  WS      :      (' '|'\r'|'\t'|'\u000C'|'\n')+ {$channel=HIDDEN;};

This pretty much kind of works, except that when the comment is last in the script and there is no terminating NL/CR, I got an annoying message from antlr (at runtime):

 line 1:20 required (...)+ loop did not match anything at character '<EOF>'

How can I get rid of this message? I tried adding EOF token to the (..)+ expression but this does not work.

like image 260
insitu Avatar asked Dec 08 '12 08:12

insitu


1 Answers

You don't need the greedy=... option: you usually need it when you have .* or .+ in your rule. And since you're already putting line break chars on the hidden channel in your WS rule, you can remove it from your SINGLELINE_COMMENT:

SINGLELINE_COMMENT
 : '//' ~('\r' | '\n')* {$channel=HIDDEN;}
 ;

WS 
 : (' '|'\r'|'\t'|'\u000C'|'\n')+ {$channel=HIDDEN;}
 ;
like image 116
Bart Kiers Avatar answered Sep 21 '22 16:09

Bart Kiers