I'm trying to parse a legacy language (which is similar to 'C') using FLEX and BISON. Everything is working nicely except for matching strings.
This rather odd legacy language doesn't support quoting characters in string literals, so the following are all valid string literals:
"hello"
""
"\"
I'm using the following rule to match string literals:
\".*\" { yylval.strval = _strdup( yytext ); return LIT_STRING; }
Unfortunately this is a greedy match, so it matches code like the following:
"hello", "world"
As a single string (hello", "world
).
The usual non-greedy quantifier .*?
doesn't seem to work in FLEX. Any ideas?
Just prohibit having a quote in between the quotes.
\"[^"]*\"
Backslash escaped quotes
The following also allows it:
\"(\\.|[^\n"\\])*\" {
fprintf( yyout, "STRING: %s\n", yytext );
}
and disallows for newlines inside of string constants.
E.g.:
>>> "a\"b""c\d"""
STRING: "a\"b"
STRING: "c\d"
STRING: ""
and fails on:
>>> "\"
When implementing such C-like features, make sure to look for existing Lex implementations, e.g.: http://www.lysator.liu.se/c/ANSI-C-grammar-l.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With