I'm currently trying to write a small compiler using Flex+Bison but I'm kinda of lost in terms of what to do with error handlling, specially how to make everything fit together. To motivate the discussion consider the following lexer fragment I'm using for string literals:
["] { BEGIN(STRING_LITERAL); init_string_buffer(); }
<STRING_LITERAL>{
\\\\ { add_char_to_buffer('\\'); }
\\\" { add_char_to_buffer('\"'); }
\\. { /*Invalid escape. How do I treat this error?*/ }
["] { BEGIN(INITIAL); yylval = get_string_buffer(); return TK_STRING; }
}
How do I handle the situation with invalid escapes? Right now I'm just printing an error message and calling exit
but I'd prefer to be able to keep going and detect more than one error per file if possible.
My questions:
There are lots of options. Which one is best is probably a matter of opinion. (And note that SO does not take kindly to questions whose answers are opinions rather than facts.)
It largely depends on how you handle error messages in your application in general. But here are a couple of possibilities:
Print an error message directly from the lexer. Tell you error-detection system that compilation was unsuccessful: you might use a global error count (yuk, globals!), or a shared data-structure passed to yylex
as an additional parameter. Then just ignore the character and continue lexing.
Return something like TK_INVALID_STRING
to the parser. The parser will need to have appropriate error
productions in order to handle and recover from this error appropriately, which is a lot more work but has the advantage of putting all error handling into the parser. However, in the particular case of strings, you'll probably want to finish lexing the string up to the closing quote; otherwise, continuing the parse will be fruitless.
As to yyerror
: there is nothing magical about yyerror
. That function is completely your responsibility. The only thing that bison does is call it with a specified set of arguments. If you find it useful for recording errors noticed in the lexer (and I think it probably is), then go ahead and use it. You're totally responsible for declaring yyerror
, so put its definition in whatever shared header file you #include
in both the lexer and the parser. Or fiddle around with bison code generation options to get the definition included in the header file created with bison. Whatever is easier. Once you've figured out how to declare yyerror
, you can define it anywhere you want: in the lexer file, in the bison file, or (my preference) in a separate library of support functions.
(FWIW, I've tried option 2, and it really seems to me like too much work; option 1 has worked fine for me. But tastes vary, and YMMV; I'm not going to defend my choice here, but I don't mind admitting to it.)
If you are using Bison with C++ output, another option is throwing an exception.
. throw yy::parser::syntax_error("invalid character: " + std::string(yytext, yyleng);
If you are using Bison 3.6 or later (with all the target languages, including C), then you can also return the YYerror
special token. This is similar to rici's suggestion return TK_INVALID_STRING
, but then the parser would complain about this unknown TK_INVALID_STRING
(so two error messages: one from your call to yyerror, another from yyparse about the unknown TK_INVALID_STRING). There is no such thing with YYerror
, yet you do properly enter error-recovery.
In other words, I would suggest in C (if your yyerror
supports variadic arguments):
yyerror (yylloc, _("syntax error: invalid character: %c"), c);
return YYerror;
This is an excerpt of the "bistromathic" example in Bison's distro (available in /usr/local/share/doc/bison/examples
in typical distros, or on Savannah and GitHub).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With