Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining k of LR(k) from this example?

I have prepared the following grammar that generates a subset of C logical and integer arithmetic expressions:

Expression:
    LogicalOrExpression
    LogicalOrExpression ? Expression : LogicalOrExpression

LogicalOrExpression:
    LogicalAndExpression
    LogicalOrExpression || LogicalAndExpression

LogicalAndExpression:
    EqualityExpression
    LogicalAndExpression && RelationalExpression

EqualityExpression:
    RelationalExpression
    EqualityExpression EqualityOperator RelationalExpression

EqualityOperator:
    ==
    !=

RelationalExpression:
    AdditiveExpression
    RelationalExpression RelationalOperator AdditiveExpression

RelationalOperator:
    <
    >
    <=
    >=

AdditiveExpression:
    MultiplicativeExpression
    AdditiveExpression AdditiveOperator MultiplicativeExpression

AdditiveOperator:
    +
    -

MultiplicativeExpression:
    UnaryExpression
    MultiplicativeExpression MultiplicativeOperator UnaryExpression

MultiplicativeOperator:
    *
    /
    %

UnaryExpression:
    PrimaryExpression
    UnaryOperator UnaryExpression

UnaryOperator:
    +
    -
    !

PrimaryExpression:
    BoolLiteral    // TERMINAL
    IntegerLiteral // TERMINAL
    Identifier     // TERMINAL
    ( Expression )

I want to try using shift/reduce parsing and so would like to know what is the smallest k (if any) for which this grammar is LR(k)? (and more generally how to determine the k from an arbitrary grammar if possible?)

like image 991
Andrew Tomazos Avatar asked Oct 22 '22 21:10

Andrew Tomazos


1 Answers

The sample grammar is (almost) an operator precedence grammar, or Floyd grammar (FG). To make it an FG, you'd have to macro-expand the non-terminals whose right-hand sides consist of only a single terminal, because operator precedence grammars must be operator grammars, and an operator grammar has the feature that no right-hand side has two consecutive non-terminals.

All operator-precedence grammars are LR(1). It's also trivial to show whether or not an operator grammar has the precedence property, and particularly trivial in the case that every terminal appears in precisely one right-hand side, as in your grammar. An operator grammar in which every terminal appears in precisely one right-hand side is always an operator-precedence grammar [1] and consequently always LR(1).

FGs are a large class of grammars, some of them even useful (Algol 60, for example, was described by an FG) for which it is easy to answer the question about them being LR(k) for some k, since the answer is always "yes, with K == 1". Just for precision, here are the properties. We use the normal convention where a grammar G is a 4-tuple (N, Σ, P, S) where N is a set of non-terminals; Σ is a set of terminals, P is a set of productions, and S is the start symbol. We write V for N &Union; Σ. In any grammar, we have:

N &Intersection; Σ &equals; ∅
S &in; N
P &subset; V&plus; × V*

The "context-free" requirement restricts P so every left-hand-side is a single non-terminal:

P &subset; Σ × V*

In an operator grammar, P is further restricted: no right-hand side is either empty, and no right-hand side has two consecutive non-terminals:

P &subset; Σ × (V+ − V*ΣΣV*)

In an operator precedence grammar, we define three precedence relations, ⋖, ⋗ and ≐. These are defined in terms of the relations Leads and Trails [2], where `

T Leads V iff T is the first terminal in some string derived from V
T Trails V iff T is the last terminal in some string derived from V

Then:

t1 ⋖ t2 iff ∃v &bepsi; t2 Leads v ∧ N&rightarrow;V*t1vV* &in; P
t1 ⋗ t2 iff ∃v &bepsi; t1 Trails v ∧ N&rightarrow;V*vt2V* &in; P
t1 &esdot; t2 iff N&rightarrow;V*t1t2V* &in; P ∨ N&rightarrow;V*t1V't2V* &in; P

An intuitive way of thinking about those relations is this: Normally when we do the derivations, we just substitute RHS for LHS, but suppose we substitute ⋖ RHS ⋗ instead. Then we can modify a derivation by dropping the non-terminals and collapsing strings of consecutive ⋖ and ⋗ to single symbols, and finally adding &esdot; between any two consecutive terminals which have no intervening operator. From that, we just read off the relations.

Now, we can perform that computation on any operator grammar, but there is nothing which forces the above relations to be exclusive. An operator grammar is a Floyd grammar precisely if those three relations are mutually exclusive.

Verifying that an operator grammar has mutually exclusive precedence relations is straight-forward; Leads and Trails require a transitive closure over First and Last, which is roughly O(|G|2) (it's actually the product of the number of non-terminals and the number of productions); from there, the precedence relations can be computed with a single linear scan over all productions in the grammar, which is O(|G|).

like image 118
rici Avatar answered Oct 27 '22 09:10

rici