Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using character literals as terminals in bison

I'm trying to understand flex/bison, but the documentation is a bit difficult for me, and I've probably grossly misunderstood something. Here's a test case: http://namakajiri.net/misc/bison_charlit_test/

File "a" contains the single character 'a'. "foo.y" has a trivial grammar like this:

%%

file: 'a' ;

The generated parser can't parse file "a"; it gives a syntax error.

The grammar "bar.y" is almost the same, only I changed the character literal for a named token:

%token TOK_A;

%%

file: TOK_A;

and then in bar.lex:

a       { return TOK_A; }

This one works just fine.

What am I doing wrong in trying to use character literals directly as bison terminals, like in the docs?

I'd like my grammar to look like "statement: selector '{' property ':' value ';' '}'" and not "statement: selector LBRACE property COLON value SEMIC RBRACE"...

I'm running bison 2.5 and flex 2.5.35 in debian wheezy.

like image 389
melissa_boiko Avatar asked Oct 07 '22 03:10

melissa_boiko


1 Answers

Rewrite

The problem is a runtime problem, not a compile time problem.

The trouble is that you have two radically different lexical analyzers.

The bar.lex analyzer recognizes an a in the input and returns it as a TOK_A and ignores everything else.

The foo.lex analyzer echoes every single character, but that's all.

foo.lex — as written

%{
#include "foo.tab.h"
%}

%%

foo.lex — equivalent

%{
#include "foo.tab.h"
%}

%%
. { ECHO; }

foo.lex — required

%{
#include "foo.tab.h"
%}

%%
. { return *yytext; }

Working code

Here's some working code with diagnostic printing in place.

foo-lex.l

%%
. { printf("Flex: %d\n", *yytext); return *yytext; }

foo.y

%{
#include <stdio.h>
void yyerror(char *s);
%}

%%

file: 'a' { printf("Bison: got file!\n") }
    ;

%%

int main(void)
{
    yyparse();
}

void yyerror(char *s)
{
    fprintf(stderr, "%s\n", s);
}

Compilation and execution

$ flex foo-lex.l
$ bison foo.y
$ gcc -o foo foo.tab.c lex.yy.c -lfl
$ echo a | ./foo
Flex: 97
Bison: got file!

$

Point of detail: how did that blank line get into the output? Answer: the lexical analyzer put it there. The pattern . does not match a newline, so the newline was treated as if there was a rule:

\n    { ECHO; }

This is why the input was accepted. If you change the foo-lex.l file to:

%%
.       { printf("Flex-1: %d\n", *yytext); return *yytext; }
\n      { printf("Flex-2: %d\n", *yytext); return *yytext; }

and then recompile and run again, the output is:

$ echo a | ./foo
Flex-1: 97
Bison: got file!
Flex-2: 10
syntax error
$

with no blank lines. This is because the grammar doesn't allow a newline to appear in a valid 'file'.

like image 168
Jonathan Leffler Avatar answered Oct 10 '22 03:10

Jonathan Leffler