When writing a ("theoretical") grammar with a rule with an empty right-hand side, one always use a symbol such as ε (or 1) to make this emptiness explicit:
A → ε | a A
Such a grammar in Yacc and others would then look like
a: | 'a' a
or "worse"
a: { $$ = new_list(); }
| a 'a' { $$ = $1; $$->append($1); }
;
The fact that in "real world grammars" (Yacc, Bison, etc.) this empty right-hand side part of the rule is not explicitly marked as empty troubles me: it is easy to miss the fact that an rhs is empty, or worse: to forget to insert |
and actually use a mid-rule action:
a: { $$ = new_list(); }
a 'a' { $$ = $1; $$->append($1); }
;
1) I don't know of any tool that provides a means to make empty rhs explicit. Are there any?
Future versions of Bison might support a dedicated symbol, with errors when used in a non-empty rhs, and warnings when a implicitly empty rhs is left.
2) Do people consider this useful?
3) What would be the notation you'd suggest?
Currently, the candidate is $empty
:
a: $empty { $$ = new_list(); }
| a 'a' { $$ = $1; $$->append($1); }
;
The chosen syntax is %empty
:
a: %empty { $$ = new_list(); }
| a 'a' { $$ = $1; $$->append($1); }
;
Indeed $empty
looks like a pseudo-symbol, such as $accept
that Bison generates for the initial rule, or the $@n
pseudo-symbols for mid-rule actions, or $eof
for, well, end-of-file. But it's definitely not a symbol, it is precisely the absence of symbols.
On the other hand %
clearly denotes a directive (some kind of attribute/metadata), like %pred
.
So it's a minor difference of syntax, but it's more consistent with the overall syntax. Credit goes to Joel E. Denny.
I usually just use a comment:
a: /*epsilon*/ { $$ = new_list(); }
| a 'a' { $$ = $1; $$->append($1); }
;
Works fine with no changes and makes the intent clear....
IMO, this comes under the heading "If it ain't broke, don't fix it"
I'd suggest the following:
Define the declaration:
%empty ID
whose semantics are two-fold:
1) ID
may be used as the only non-rule token in a RHS, to indicate that the RHS is an epsilon production; and
2) epsilon productions not marked with ID
are to be considered syntax errors.
Thus, with the declaration:
%empty epsilon
epsilon
must be used to mark an empty RHS; without any %empty
declaration, the status quo holds, where empty RHSs are unmarked (except perhaps with comments).
That would allow users who like to explicitly mark empty RHSs to do so immediately, without having any impact on existing grammar files or users who don't want to explicitly mark empty RHSs in this fashion.
Personally, I'd probably use such a declaration, although to be honest I'm pretty used to using a comment to mark an empty RHS, and I don't believe I've ever accidentally made an empty RHS. So I wouldn't mark it as a priority feature-request, but nor would I object to its implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With