Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to recognize single line comments in Lex

Am learning lex in this process, I'm generating tokens for the C language, and am trying to recognize single line comments "//", but am having a conflict with the division operator

[1-9][0-9]*|0x[0-9a-fA-F][0-9a-fA-F]*           return NUMBER;
[a-zA-Z][a-zA-Z0-9]*                            return IDENT;
/                                               {return DIVIDE;}

[ \t\r\n]
[//]

But when am running the example and entering // it's recognizing them as 2 division operators. Where should I be modifying the code. Any suggestions.

Edit:

Lex Code:

%{
#include "y.tab.h"
%}
%array
%%
if                                              {return IF;}
while                                           {return WHILE;}
else                                            {return ELSE;}
int                                             {return INT;}
return                                          {return RETURN;}
\/\/[^\r\n]*
[1-9][0-9]*|0x[0-9a-fA-F][0-9a-fA-F]*           return NUMBER;
[a-zA-Z][a-zA-Z0-9]*                            return IDENT;

[+]                                             {return ADD;}
[-]                                             {return SUB;}
[<]                                             {return LESS;}
[>]                                             {return GREAT;}
[*]                                             {return MULT;}
[/]                                             {return DIVIDE;}
[;]                                             {return SEMICOLON;}

\{                                              return LBRACE;
\}                                              return RBRACE;

[ \t\r\n]

\(                                              return LPAREN;

\)                                              return RPAREN;

.                                               return BADCHAR;
%%

The following is the header file I use

typedef enum {END=0, WHILE, IF, ELSE,RETURN, IDENT, LPAREN, RPAREN,INT,LBRACE,RBRACE, SEMICOLON, EQUALITY, DIVIDE, MULT, LESS, GREAT,
 ADD, SUB, NUMBER,BADCHAR} Token;

The following is the input am running,

//
/
p
Token 16, text /
Token 16, text /
Token 16, text /
Token 5, text p

When am running it, comments are consumed and even the divide operator is ignored. But check when am entering p, it classifies the operators listed above, which it shouldn't be doing.

Note: Am trying to ignore tabs, newline characters and single line comments. 

Note 2: \/\/[^\r\n]* I have understood where I committed the mistake and wanted to share this.
like image 710
user265867 Avatar asked Feb 12 '10 04:02

user265867


1 Answers

According to the Lex manual:

The lexical analysis programs written with Lex accept ambiguous specifications and choose the longest match possible at each input point. If necessary, substantial lookahead is performed on the input, but the input stream will be backed up to the end of the current partition, so that the user has general freedom to manipulate it.

So you should not need to do anything special - // is longer than / so it will prefer a comment over a division operator when it sees two. However, you didn't post your comment rule - where is it?

Edit: never mind, I see it. [//] is a character class. Remove the square brackets. Also, you will want to match to the end of the line - otherwise you will only allow empty comments. So your regex should be something like:

//[^\r\n]*\r\n (adjust as necessary for the newline characters you are supporting - this one requires that a newline be exactly \r\n).

Edit 2: @tur1ng brings up a good point - the last line in your file may not end with a newline. I looked it up and Lex supports <<EOF>> in its regexes also (see http://pltplp.net/lex-yacc/lex.html.en). So you could change to:

//[^\r\n]*((\r\n)|<<EOF>>)

like image 97
danben Avatar answered Oct 16 '22 10:10

danben