Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of precedence for token matching in Flex

My apologies if the title of this thread is a little confusing. What I'm asking about is how does Flex (the lexical analyzer) handle issues of precedence?

For example, let's say I have two tokens with similar regular expressions, written in the following order:

"//"[!\/]{1}    return FIRST;
"//"[!\/]{1}\<  return SECOND;

Given the input "//!<", will FIRST or SECOND be returned? Or both?

The FIRST string would be reached before the SECOND string, but it seems that returning SECOND would be the right behavior.

like image 568
Casey Patton Avatar asked Jul 18 '11 17:07

Casey Patton


1 Answers

The longest match is returned.

From flex & bison, Text Processing Tools:

How Flex Handles Ambiguous Patterns

Most flex programs are quite ambiguous, with multiple patterns that can match the same input. Flex resolves the ambiguity with two simple rules:

  • Match the longest possible string every time the scanner matches input.
  • In the case of a tie, use the pattern that appears first in the program.

You can test this yourself, of course:

file: demo.l

%%
"//"[!/]   {printf("FIRST");}
"//"[!/]<  {printf("SECOND");}
%%

int main(int argc, char **argv)
{
    while(yylex() != 0);
    return 0;
}

Note that / and < don't need escaping, and {1} is redundant.

bart@hades:~/Programming/GNU-Flex-Bison/demo$ flex demo.l 
bart@hades:~/Programming/GNU-Flex-Bison/demo$ cc lex.yy.c  -lfl
bart@hades:~/Programming/GNU-Flex-Bison/demo$ ./a.out < in.txt 
SECOND

where in.txt contains //!<.

like image 163
Bart Kiers Avatar answered Nov 05 '22 14:11

Bart Kiers