Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bison: Optional tokens in a single rule

i'm using GNU Bison 2.4.2 to write a grammar for a new language i'm working on and i have a question. When i specify a rule, let's say :

statement : T_CLASS T_IDENT  '{' T_CLASS_MEMBERS '}' {
           // create a node for the statement ...
}

If i have a variation on the rule, for instance

statement : T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST  '{' T_CLASS_MEMBERS '}' {
           // create a node for the statement ...
}

Where (from flex scanner rules) :

"class"                     return T_CLASS;
"extends"                   return T_EXTENDS;
[a-zA-Z\_][a-zA-Z0-9\_]*    return T_IDENT;

(and T_IDENT_LIST is a rule for comma separated identifiers).

Is there any way to specify all of this only in one rule, setting somehow the "T_EXTENDS T_IDENT_LIST" as optional? I've already tried with

 T_CLASS T_IDENT (T_EXTENDS T_IDENT_LIST)? '{' T_CLASS_MEMBERS '}' {
     // create a node for the statement ...
 } 

But Bison gave me an error.

Thanks

like image 266
Simone Margaritelli Avatar asked Apr 19 '10 17:04

Simone Margaritelli


People also ask

What is $$ in bison?

If there were a useful semantic value associated with the `+' token, it could be referred to as $2 . If you don't specify an action for a rule, Bison supplies a default: $$ = $1 . Thus, the value of the first symbol in the rule becomes the value of the whole rule.

How Bison parser works?

The Bison parser is a bottom-up parser. It tries, by shifts and reductions, to reduce the entire input down to a single grouping whose symbol is the grammar's start-symbol.

What is bison in compiler?

Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR (1) parser tables. As an experimental feature, Bison can also generate IELR (1) or canonical LR(1) parser tables.


1 Answers

To make a long story short, no. Bison only deals with LALR(1) grammars, which means it only uses one symbol of lookahead. What you need is something like this:

statement: T_CLASS T_IDENT extension_list '{' ...

extension_list: 
              | T_EXTENDS T_IDENT_LIST
              ;

There are other parser generators that work with more general grammars though. If memory serves, some of them support optional elements relatively directly like you're asking for.

like image 56
Jerry Coffin Avatar answered Oct 16 '22 16:10

Jerry Coffin