How to handle nested comments in antlr4 lexer? ie I need to count the number of "/*" inside this token and close only after the same number of "*/" have been received. As an example, the D language has such nested comments as "/+ ... +/"
For example, the following lines should be treated as one block of comments:
/* comment 1
comment 2
/* comment 3
comment 4
*/
// comment 5
comment 6
*/
My current code is the following, and it does not work on the above nested comment:
COMMENT : '/*' .*? '*/' -> channel(HIDDEN)
;
LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' -> channel(HIDDEN)
;
Terence Parr has these two lexer lines in his Swift Antlr4 grammar for lexing out nested comments:
COMMENT : '/*' (COMMENT|.)*? '*/' -> channel(HIDDEN) ;
LINE_COMMENT : '//' .*? '\n' -> channel(HIDDEN) ;
I'm using:
COMMENT: '/*' ('/'*? COMMENT | ('/'* | '*'*) ~[/*])*? '*'*? '*/' -> skip;
This forces any /*
inside a comment to be the beginning of a nested comment and similarly with */
. In other words, there's no way to recognize /*
and */
other than at the beginning and end of the rule COMMENT
.
This way, something like /* /* /* */ a */
would not be recognized entirely as a (bad) comment (mismatched /*
s and */
s), as it would if using COMMENT: '/*' (COMMENT|.)*? '*/' -> skip;
, but as /
, followed by *
, followed by correct nested comments /* /* */ a */
.
Works for Antlr3.
Allows nested comments and '*' within a comment.
fragment
F_MultiLineCommentTerm
:
( {LA(1) == '*' && LA(2) != '/'}? => '*'
| {LA(1) == '/' && LA(2) == '*'}? => F_MultiLineComment
| ~('*')
)*
;
fragment
F_MultiLineComment
:
'/*'
F_MultiLineCommentTerm
'*/'
;
H_MultiLineComment
: r= F_MultiLineComment
{ $channel=HIDDEN;
printf(stder,"F_MultiLineComment[\%s]",$r->getText($r)->chars);
}
;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With