Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to handling nested comments in antlr lexer

Tags:

antlr4

How to handle nested comments in antlr4 lexer? ie I need to count the number of "/*" inside this token and close only after the same number of "*/" have been received. As an example, the D language has such nested comments as "/+ ... +/"

For example, the following lines should be treated as one block of comments:

/* comment 1
   comment 2
   /* comment 3
      comment 4
   */
   // comment 5
   comment 6
*/

My current code is the following, and it does not work on the above nested comment:

COMMENT : '/*' .*? '*/' -> channel(HIDDEN)
        ;
LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n'  -> channel(HIDDEN)
        ;
like image 816
R71 Avatar asked Dec 18 '14 04:12

R71


3 Answers

Terence Parr has these two lexer lines in his Swift Antlr4 grammar for lexing out nested comments:

COMMENT : '/*' (COMMENT|.)*? '*/' -> channel(HIDDEN) ;
LINE_COMMENT  : '//' .*? '\n' -> channel(HIDDEN) ;
like image 57
mikebridge Avatar answered Sep 18 '22 23:09

mikebridge


I'm using:

COMMENT: '/*' ('/'*? COMMENT | ('/'* | '*'*) ~[/*])*? '*'*? '*/' -> skip;

This forces any /* inside a comment to be the beginning of a nested comment and similarly with */. In other words, there's no way to recognize /* and */ other than at the beginning and end of the rule COMMENT.

This way, something like /* /* /* */ a */ would not be recognized entirely as a (bad) comment (mismatched /*s and */s), as it would if using COMMENT: '/*' (COMMENT|.)*? '*/' -> skip;, but as /, followed by *, followed by correct nested comments /* /* */ a */.

like image 26
KinGamer Avatar answered Sep 19 '22 23:09

KinGamer


Works for Antlr3.

Allows nested comments and '*' within a comment.

fragment
F_MultiLineCommentTerm
:
(   {LA(1) == '*' && LA(2) != '/'}? => '*'
|   {LA(1) == '/' && LA(2) == '*'}? => F_MultiLineComment
|   ~('*') 
)*
;   

fragment
F_MultiLineComment
:
'/*' 
F_MultiLineCommentTerm
'*/'
;   

H_MultiLineComment
:   r=  F_MultiLineComment
    {   $channel=HIDDEN;
        printf(stder,"F_MultiLineComment[\%s]",$r->getText($r)->chars); 
    }
;
like image 34
Douglas Avatar answered Sep 18 '22 23:09

Douglas