Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalizing structural differences in grammars

Consider the following grammar:

S → A | B
A → xy
B → xyz

This is what I think an LR(0) parser would do given the input xyz:

    | xyz   → shift
  x | yz    → shift
 xy | z     → reduce
  A | z     → shift
 Az |       → fail

If my assumption is correct and we changed rule B to read:

B → Az

now the grammar suddenly becomes acceptable by an LR(0) parser. I presume this new grammar describes the exact same set of strings than the first grammar in this question.

  • What are the differences between the first and second grammars?
  • How do we decouple structural differences in our grammars from the language they describe?
    • Through normalization?
    • What kind of normalization?

To further clarify:

I want to describe a language to a parser, without the structure of the grammar playing a role. I'd like to obtain the most minimal/fundamental description of a set of strings. For LR(k) grammars, I'd like to minimize the k.

like image 459
thwd Avatar asked Sep 26 '14 22:09

thwd


1 Answers

I think your LR(0) parser is not a standard parser:

  • Source

An LR(0) parser is a shift/reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). This means that in any configuration of the parser, the parser must have an unambiguous action to choose - either it shifts a specific symbol or applies a specific reduction. If there are ever two or more choices to make, the parser fails and we say that the grammar is not LR(0).

So, When you have:

S->A|B
A->xy
B->Az

or

S->A|B
A->xy
B->xyz

LR(0) will never check B rule, And for both of them it will fail.

State0 - Clousure(S->°A):
   S->°A
   A->°xy

 Arcs:
  0 --> x --> 2
  0 --> A --> 1
-------------------------

State1 - Goto(State0,A):
   S->A°

 Arcs:
  1 --> $ --> Accept
-------------------------

State2 - Goto(State0,x):
   A->x°y

 Arcs:
  2 --> y --> 3
-------------------------

State3 - Goto(State2,y):
   A->xy°

 Arcs:
-------------------------

But if you have

I->S
S->A|B
A->xy
B->xyz                           or B->Az

Both of them will accept the xyz, but in difference states:

State0 - Clousure(I->°S):
   I->°S
   S->°A
   S->°B
   A->°xy                           A->°xy, $z
   B->°xyz                          B->°Az, $

 Arcs:
  0 --> x --> 4
  0 --> S --> 1
  0 --> A --> 2
  0 --> B --> 3
-------------------------

State1 - Goto(State0,S):
   I->S°

 Arcs:
  1 --> $ --> Accept
-------------------------

State2 - Goto(State0,A):
   S->A°                            S->A°, $ 
                                    B->A°z, $
 Arcs:                              2 --> z --> 5
-------------------------

State3 - Goto(State0,B):
   S->B°

 Arcs:
-------------------------

State4 - Goto(State0,x):
   A->x°y                            A->x°y, $z
   B->x°yz

 Arcs:
  4 --> y --> 5                     4 --> y --> 6
-------------------------

State5 - Goto(State4,y):            - Goto(State2,z):
   A->xy°                           B->Az°, $

 Arcs:
  5 --> z --> 6                     -<None>-
-------------------------

State6 - Goto(State5,z):            - Goto(State4,y)
   B->xyz°                          A->xy°, $z

 Arcs:
-------------------------

You can see the Goto Table and Action Table is different.

[B->xyz]                                  [B->Az]
  | Stack   | Input  | Action               | Stack   | Input  | Action
--+---------+--------+----------          --+---------+--------+----------
1 | 0       | xyz$   | Shift              1 | 0       | xyz$   | Shift 
2 | 0 4     | yz$    | Shift              2 | 0 4     | xy$    | Shift
3 | 0 4 5   | z$     | Shift              3 | 0 4 6   | z$     | Reduce A->xy
4 | 0 4 5 6 | $      | Reduce B->xyz      4 | 0 2     | z$     | Shift 
5 | 0 3     | $      | Reduce S->B        5 | 0 2 5   | $      | Reduce B->Az
6 | 0 1     | $      | Accept             6 | 0 3     | $      | Reduce S->B
                                          7 | 0 1     | $      | Accept

Simply when you change B->xyz to B->Az you add an Action to your LR Table to find the differences you can check Action Table and Goto Table (Constructing LR(0) parsing tables)

  • When you have A->xy and B->xyz then you have two bottom handles [xy or xyz] but when you have B->Az you have only one bottom handle [xy] that can accept an additional z.

  • I think related to local optimization -c=a+b; d=a+b -> c=a+b; d=c- when you use B->Az you make B->xyz optimized.

like image 102
shA.t Avatar answered Oct 23 '22 18:10

shA.t