I have prepared the following grammar that generates a subset of C logical and integer arithmetic expressions:
Expression:
LogicalOrExpression
LogicalOrExpression ? Expression : LogicalOrExpression
LogicalOrExpression:
LogicalAndExpression
LogicalOrExpression || LogicalAndExpression
LogicalAndExpression:
EqualityExpression
LogicalAndExpression && RelationalExpression
EqualityExpression:
RelationalExpression
EqualityExpression EqualityOperator RelationalExpression
EqualityOperator:
==
!=
RelationalExpression:
AdditiveExpression
RelationalExpression RelationalOperator AdditiveExpression
RelationalOperator:
<
>
<=
>=
AdditiveExpression:
MultiplicativeExpression
AdditiveExpression AdditiveOperator MultiplicativeExpression
AdditiveOperator:
+
-
MultiplicativeExpression:
UnaryExpression
MultiplicativeExpression MultiplicativeOperator UnaryExpression
MultiplicativeOperator:
*
/
%
UnaryExpression:
PrimaryExpression
UnaryOperator UnaryExpression
UnaryOperator:
+
-
!
PrimaryExpression:
BoolLiteral // TERMINAL
IntegerLiteral // TERMINAL
Identifier // TERMINAL
( Expression )
I want to try using shift/reduce parsing and so would like to know what is the smallest k (if any) for which this grammar is LR(k)? (and more generally how to determine the k from an arbitrary grammar if possible?)
The sample grammar is (almost) an operator precedence grammar, or Floyd grammar (FG). To make it an FG, you'd have to macro-expand the non-terminals whose right-hand sides consist of only a single terminal, because operator precedence grammars must be operator grammars, and an operator grammar has the feature that no right-hand side has two consecutive non-terminals.
All operator-precedence grammars are LR(1)
. It's also trivial to show whether or not an operator grammar has the precedence property, and particularly trivial in the case that every terminal appears in precisely one right-hand side, as in your grammar. An operator grammar in which every terminal appears in precisely one right-hand side is always an operator-precedence grammar [1] and consequently always LR(1)
.
FGs are a large class of grammars, some of them even useful (Algol 60, for example, was described by an FG) for which it is easy to answer the question about them being LR(k)
for some k
, since the answer is always "yes, with K == 1
". Just for precision, here are the properties. We use the normal convention where a grammar G
is a 4-tuple (N, Σ, P, S) where N
is a set of non-terminals; Σ
is a set of terminals, P
is a set of productions, and S
is the start symbol. We write V
for N ⋃ Σ
. In any grammar, we have:
N ⋂ Σ = ∅
S ∈ N
P ⊂ V+ × V*
The "context-free" requirement restricts P
so every left-hand-side is a single non-terminal:
P ⊂ Σ × V*
In an operator grammar, P
is further restricted: no right-hand side is either empty, and no right-hand side has two consecutive non-terminals:
P ⊂ Σ × (V+ − V*ΣΣV*)
In an operator precedence grammar, we define three precedence relations, ⋖, ⋗ and ≐. These are defined in terms of the relations Leads
and Trails
[2], where `
T Leads V iff T is the first terminal in some string derived from V
T Trails V iff T is the last terminal in some string derived from V
Then:
t1 ⋖ t2 iff ∃v ϶ t2 Leads v ∧ N→V*t1vV* ∈ P
t1 ⋗ t2 iff ∃v ϶ t1 Trails v ∧ N→V*vt2V* ∈ P
t1 ≐ t2 iff N→V*t1t2V* ∈ P ∨ N→V*t1V't2V* ∈ P
An intuitive way of thinking about those relations is this: Normally when we do the derivations, we just substitute RHS for LHS, but suppose we substitute ⋖ RHS ⋗
instead. Then we can modify a derivation by dropping the non-terminals and collapsing strings of consecutive ⋖ and ⋗ to single symbols, and finally adding ≐ between any two consecutive terminals which have no intervening operator. From that, we just read off the relations.
Now, we can perform that computation on any operator grammar, but there is nothing which forces the above relations to be exclusive. An operator grammar is a Floyd grammar precisely if those three relations are mutually exclusive.
Verifying that an operator grammar has mutually exclusive precedence relations is straight-forward; Leads
and Trails
require a transitive closure over First
and Last
, which is roughly O(|G|2)
(it's actually the product of the number of non-terminals and the number of productions); from there, the precedence relations can be computed with a single linear scan over all productions in the grammar, which is O(|G|)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With