Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I handle lexical errors in my Flex lexer?

I'm currently trying to write a small compiler using Flex+Bison but I'm kinda of lost in terms of what to do with error handlling, specially how to make everything fit together. To motivate the discussion consider the following lexer fragment I'm using for string literals:

["]          { BEGIN(STRING_LITERAL); init_string_buffer(); }
<STRING_LITERAL>{
    \\\\    { add_char_to_buffer('\\'); }
    \\\"    { add_char_to_buffer('\"'); }
    \\.     { /*Invalid escape. How do I treat this error?*/ }
    ["]     { BEGIN(INITIAL); yylval = get_string_buffer(); return TK_STRING; }
}

How do I handle the situation with invalid escapes? Right now I'm just printing an error message and calling exit but I'd prefer to be able to keep going and detect more than one error per file if possible.

My questions:

  • What function do I use to print out error messages? The same yyerror expected by bison later on? Where do I put the definition of yyerror if I have separate files for the lexer and parser?
  • What token code should I return from my action? 0 for "end of file"? Some special TK_INVALID_STRING token?
  • How do I make sure the parser can continue parsing after lexical errors (invalid literals, stray punctuation characters, etc)?
like image 367
hugomg Avatar asked Sep 16 '13 21:09

hugomg


2 Answers

There are lots of options. Which one is best is probably a matter of opinion. (And note that SO does not take kindly to questions whose answers are opinions rather than facts.)

It largely depends on how you handle error messages in your application in general. But here are a couple of possibilities:

  1. Print an error message directly from the lexer. Tell you error-detection system that compilation was unsuccessful: you might use a global error count (yuk, globals!), or a shared data-structure passed to yylex as an additional parameter. Then just ignore the character and continue lexing.

  2. Return something like TK_INVALID_STRING to the parser. The parser will need to have appropriate error productions in order to handle and recover from this error appropriately, which is a lot more work but has the advantage of putting all error handling into the parser. However, in the particular case of strings, you'll probably want to finish lexing the string up to the closing quote; otherwise, continuing the parse will be fruitless.

As to yyerror: there is nothing magical about yyerror. That function is completely your responsibility. The only thing that bison does is call it with a specified set of arguments. If you find it useful for recording errors noticed in the lexer (and I think it probably is), then go ahead and use it. You're totally responsible for declaring yyerror, so put its definition in whatever shared header file you #include in both the lexer and the parser. Or fiddle around with bison code generation options to get the definition included in the header file created with bison. Whatever is easier. Once you've figured out how to declare yyerror, you can define it anywhere you want: in the lexer file, in the bison file, or (my preference) in a separate library of support functions.

(FWIW, I've tried option 2, and it really seems to me like too much work; option 1 has worked fine for me. But tastes vary, and YMMV; I'm not going to defend my choice here, but I don't mind admitting to it.)

like image 154
rici Avatar answered Nov 06 '22 04:11

rici


If you are using Bison with C++ output, another option is throwing an exception.

.   throw yy::parser::syntax_error("invalid character: " + std::string(yytext, yyleng);

If you are using Bison 3.6 or later (with all the target languages, including C), then you can also return the YYerror special token. This is similar to rici's suggestion return TK_INVALID_STRING, but then the parser would complain about this unknown TK_INVALID_STRING (so two error messages: one from your call to yyerror, another from yyparse about the unknown TK_INVALID_STRING). There is no such thing with YYerror, yet you do properly enter error-recovery.

In other words, I would suggest in C (if your yyerror supports variadic arguments):

yyerror (yylloc, _("syntax error: invalid character: %c"), c);
return YYerror;

This is an excerpt of the "bistromathic" example in Bison's distro (available in /usr/local/share/doc/bison/examples in typical distros, or on Savannah and GitHub).

like image 43
akim Avatar answered Nov 06 '22 03:11

akim