Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for a string literal in flex/lex

A string consists of a quote mark

"

followed by zero or more of either an escaped anything

\\.

or a non-quote character, non-backslash character

[^"\\]

and finally a terminating quote

"

Put it all together, and you've got

\"(\\.|[^"\\])*\"

The delimiting quotes are escaped because they are Flex meta-characters.


For a single line... you can use this:

\"([^\\\"]|\\.)*\"  {/*matches string-literal on a single line*/;}

How about using a start state...

int enter_dblquotes = 0;

%x DBLQUOTES
%%

\"  { BEGIN(DBLQUOTES); enter_dblquotes++; }

<DBLQUOTES>*\" 
{ 
   if (enter_dblquotes){
       handle_this_dblquotes(yytext); 
       BEGIN(INITIAL); /* revert back to normal */
       enter_dblquotes--; 
   } 
}
         ...more rules follow...

It was similar to that effect (flex uses %s or %x to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.


Paste my code snippet about handling string in flex, hope inspire your thinking.

Use Start Condition to handle string literal will be more scalable and clear.

%x SINGLE_STRING

%%

\"                          BEGIN(SINGLE_STRING);
<SINGLE_STRING>{
  \n                        yyerror("the string misses \" to termiate before newline");
  <<EOF>>                   yyerror("the string misses \" to terminate before EOF");
  ([^\\\"]|\\.)*            {/* do your work like save in here */}
  \"                        BEGIN(INITIAL);
  .                         ;
}