Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A notation for empty right-hand sides of rules

When writing a ("theoretical") grammar with a rule with an empty right-hand side, one always use a symbol such as ε (or 1) to make this emptiness explicit:

A → ε | a A

Such a grammar in Yacc and others would then look like

a: | 'a' a

or "worse"

a:       { $$ = new_list(); }
 | a 'a' { $$ = $1; $$->append($1); }
 ;

The fact that in "real world grammars" (Yacc, Bison, etc.) this empty right-hand side part of the rule is not explicitly marked as empty troubles me: it is easy to miss the fact that an rhs is empty, or worse: to forget to insert | and actually use a mid-rule action:

a:       { $$ = new_list(); }
   a 'a' { $$ = $1; $$->append($1); }
 ;

1) I don't know of any tool that provides a means to make empty rhs explicit. Are there any?

Future versions of Bison might support a dedicated symbol, with errors when used in a non-empty rhs, and warnings when a implicitly empty rhs is left.

2) Do people consider this useful?

3) What would be the notation you'd suggest?

Currently, the candidate is $empty:

a: $empty { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

EDIT

The chosen syntax is %empty:

a: %empty { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

Indeed $empty looks like a pseudo-symbol, such as $accept that Bison generates for the initial rule, or the $@n pseudo-symbols for mid-rule actions, or $eof for, well, end-of-file. But it's definitely not a symbol, it is precisely the absence of symbols.

On the other hand % clearly denotes a directive (some kind of attribute/metadata), like %pred.

So it's a minor difference of syntax, but it's more consistent with the overall syntax. Credit goes to Joel E. Denny.

like image 996
akim Avatar asked Feb 01 '13 13:02

akim


2 Answers

I usually just use a comment:

a: /*epsilon*/ { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

Works fine with no changes and makes the intent clear....

IMO, this comes under the heading "If it ain't broke, don't fix it"

like image 97
Chris Dodd Avatar answered Nov 14 '22 22:11

Chris Dodd


I'd suggest the following:

Define the declaration:

%empty ID

whose semantics are two-fold:

1) ID may be used as the only non-rule token in a RHS, to indicate that the RHS is an epsilon production; and

2) epsilon productions not marked with ID are to be considered syntax errors.

Thus, with the declaration:

%empty epsilon

epsilon must be used to mark an empty RHS; without any %empty declaration, the status quo holds, where empty RHSs are unmarked (except perhaps with comments).

That would allow users who like to explicitly mark empty RHSs to do so immediately, without having any impact on existing grammar files or users who don't want to explicitly mark empty RHSs in this fashion.

Personally, I'd probably use such a declaration, although to be honest I'm pretty used to using a comment to mark an empty RHS, and I don't believe I've ever accidentally made an empty RHS. So I wouldn't mark it as a priority feature-request, but nor would I object to its implementation.

like image 3
rici Avatar answered Nov 14 '22 21:11

rici