Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I write a non-greedy match in LEX / FLEX?

I'm trying to parse a legacy language (which is similar to 'C') using FLEX and BISON. Everything is working nicely except for matching strings.

This rather odd legacy language doesn't support quoting characters in string literals, so the following are all valid string literals:

"hello"
""
"\"

I'm using the following rule to match string literals:

\".*\"            { yylval.strval = _strdup( yytext ); return LIT_STRING; }

Unfortunately this is a greedy match, so it matches code like the following:

"hello", "world"

As a single string (hello", "world).

The usual non-greedy quantifier .*? doesn't seem to work in FLEX. Any ideas?

like image 412
stusmith Avatar asked Nov 12 '10 15:11

stusmith


2 Answers

Just prohibit having a quote in between the quotes.

\"[^"]*\"
like image 93
horsh Avatar answered Nov 20 '22 09:11

horsh


Backslash escaped quotes

The following also allows it:

\"(\\.|[^\n"\\])*\" {
        fprintf( yyout, "STRING: %s\n", yytext );
    }

and disallows for newlines inside of string constants.

E.g.:

>>> "a\"b""c\d"""
STRING: "a\"b"
STRING: "c\d"
STRING: ""

and fails on:

>>> "\"

When implementing such C-like features, make sure to look for existing Lex implementations, e.g.: http://www.lysator.liu.se/c/ANSI-C-grammar-l.html