Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bison: How to ignore a token if it doesn't fit into a rule

I'm writing a program that handles comments as well as a few other things. If a comment is in a specific place, then my program does something.

Flex passes a token upon finding a comment, and Bison then looks to see if that token fits into a particular rule. If it does, then it takes an action associated with that rule.

Here's the thing: the input I'm receiving might actually have comments in the wrong places. In this case, I just want to ignore the comment rather than flagging an error.

My question:
How can I use a token if it fits into a rule, but ignore it if it doesn't? Can I make a token "optional"?

(Note: The only way I can think of of doing this right now is scattering the comment token in every possible place in every possible rule. There MUST be a better solution than this. Maybe some rule involving the root?)

like image 900
Casey Patton Avatar asked Jul 22 '11 23:07

Casey Patton


2 Answers

One solution may be to use bison's error recovery (see the Bison manual).

To summarize, bison defines the terminal token error to represent an error (say, a comment token returned in the wrong place). That way, you can (for example) close parentheses or braces after the wayward comment is found. However, this method will probably discard a certain amount of parsing, because I don't think bison can "undo" reductions. ("Flagging" the error, as with printing a message to stderr, is not related to this: you can have an error without printing an error--it depends on how you define yyerror.)

You may instead want to wrap each terminal in a special nonterminal:

term_wrap: comment TERM

This effectively does what you're scared to do (put in a comment in every single rule), but it does it in fewer places.

To force myself to eat my own dog food, I made up a silly language for myself. The only syntax is print <number> please, but if there's (at least) one comment (##) between the number and the please, it prints the number in hexadecimal, instead.

Like this:

print 1 please
1
## print 2 please
2
print ## 3 please
3
print 4 ## please
0x4
print 5 ## ## please
0x5
print 6 please ##
6

My lexer:

%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
%}

%%

print           return PRINT;
[[:digit:]]+    yylval = atoi(yytext); return NUMBER;
please          return PLEASE;
##              return COMMENT;

[[:space:]]+    /* ignore */
.               /* ditto */

and the parser:

%debug
%error-verbose
%verbose
%locations

%{
#include <stdio.h>
#include <string.h>

void yyerror(const char *str) {
        fprintf(stderr, "error: %s\n", str);
}

int yywrap() {
        return 1;
} 

extern int yydebug;
int main(void) {
    yydebug = 0;
    yyparse();
}
%}

%token PRINT NUMBER COMMENT PLEASE

%%

commands: /* empty */
        |
        commands command
    ;

command: print number comment please {
        if ($3) {
            printf("%#x", $2);
        } else {
            printf("%d", $2);
        }
        printf("\n");
     }
     ;

print: comment PRINT
     ;

number: comment NUMBER {
        $$ = $2;
      }
      ;

please: comment PLEASE
      ;

comment: /* empty */ {
            $$ = 0;
       }
       |
        comment COMMENT {
            $$ = 1;
        }
    ;

So, as you can see, not exactly rocket science, but it does the trick. There's a shift/reduce conflict in there, because of the empty string matching comment in multiple places. Also, there's no rule to fit comments in between the final please and EOF. But overall, I think it's a good example.

like image 77
JXG Avatar answered Jan 02 '23 11:01

JXG


Treat comments as whitespace at the lexer level. But keep two separate rules, one for whitespace and one for comments, both returning the same token ID.

  • The rule for comments (+ optional whitespace) keeps track of the comment in a dedicated structure.
  • The rule for whitespace resets the structure.

When you enter that “specific place”, look if the last whitespace was a comment or trigger an error.

like image 31
fra Avatar answered Jan 02 '23 13:01

fra