Given the following basic grammar I want to understand how I can handle comment lines. Missing is the handling of the <CR><LF> which usually terminates the comment line - the only exception is a last comment line before the EOF, e. g.:
# comment
abcd := 12 ;
# comment eof without <CR><LF>
grammar CommentLine1a;
//==========================================================
// Options
//==========================================================
//==========================================================
// Lexer Rules
//==========================================================
Int
  : Digit+
  ;
fragment Digit
  : '0'..'9'
  ;
ID_NoDigitStart
  : ( 'a'..'z' | 'A'..'Z' ) ('a'..'z' | 'A'..'Z' | Digit )*
  ;
Whitespace
  : ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN ; }
  ; 
//==========================================================
// Parser Rules
//==========================================================
code
  : ( assignment | comment )+
  ;
assignment
  : id_NoDigitStart ':=' id_DigitStart ';'
  ;
id_NoDigitStart
  : ID_NoDigitStart
  ;  
id_DigitStart
  : ( ID_NoDigitStart | Int )+
  ;
comment
  : '#' ~( '\r' | '\n' )*
  ;
Unless you have a very compelling reason to put the comment inside the parser (which I'd like to hear), you should put it in the lexer:
Comment
  :  '#' ~( '\r' | '\n' )*
  ;
And since you already account for line breaks in your Space rule, there's no problem with input like # comment eof without <CR><LF> 
Also, if you use literal tokens inside parser rules, ANTLR automatically creates lexer rules of them behind the scenes. So in your case:
comment
  :  '#' ~( '\r' | '\n' )*
  ;
would match a '#' followed by zero or more tokens other than '\r' and '\n' and not zero or more characters other than '\r' and '\n'.
For future reference:
~ negates tokens. matches any token~ negates characters. matches any character in the range 0x0000 ... 0xFFFF
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With