Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transform a production to LL(1) for a list separated by a semicolon?

I'm reading this introductory book on parsing (which is pretty good btw) and one of the exercice is to "build a parser for your favorite language." Since I don't want to die today, I thought I could do a parser for something relatively simple, ie a simplified CSS.

Note: This book teach you how to write a LL(1) parser using the recursive-descent algorithm.

So, as a sub-exercice, I am building the grammar from what I know of CSS. But I'm stuck on a production that I can't transform in LL(1) :

//EBNF
block = "{", declaration, {";", declaration}, [";"], "}"

//BNF
<block> =:: "{" <declaration> "}"
<declaration> =:: <single-declaration> <opt-end> | <single-declaration> ";" <declaration>
<opt-end> =:: "" | ";"

This describe a CSS block. Valid block can have the form :

{ property : value }
{ property : value; }
{ property : value; property : value }
{ property : value; property : value; }
...

The problem is with the optional ";" at the end, because it overlap with the starting character of {";", declaration}, so when my parser meet a semicolon in this context, it doesn't know what to do.

The book talk about this problem, but in its example, the semicolon is obligatory, so the rule can be modified like this :

block = "{", declaration, ";", {declaration, ";"}, "}"

So, Is it possible to achieve what I'm trying to do using a LL(1) parser?

like image 727
subb Avatar asked Nov 05 '22 14:11

subb


2 Answers

I think I got it:

//EBNF 
block = "{", decl, "}"
decl = simple-decl, [";", [decl]]
simple-decl = ...

//BNF
<block> =:: "{" <decl> "}"
<decl> =:: <simple-decl> <decl-end>
<decl-end> =:: ";" <decl-tail> | ""
<decl-tail> =:: <decl> | e

This produce the following code :

private function parseBlock():void {
    accept(Token.LBRACE);
    parseDecl();
    accept(Token.RBRACE);
}


//Token.IDENTIFIER is the starting token of a declaration
private function parseDecl():void {
    accept(Token.IDENTIFIER);
    if(_currentToken.kind == Token.SEMICOLON){
        accept(Token.SEMICOLON);
        if(_currentToken.kind == Token.IDENTIFIER){
            parseDecl();
        }
    }
}

I'm I right?

like image 56
subb Avatar answered Nov 11 '22 05:11

subb


If you allow null declarations, you can remove the ambiguity:

//EBNF
block = "{", declaration, {";", declaration}, "}"
declaration = "" | ...

//BNF
<block> =:: "{" <declaration-list> "}"
<declaration-list> =:: <declaration> | <declaration> ";" <declaration-list>
<declaration> =:: "" | ...

though this isn't a requirement:

// these don't allow empty declarations
block = "{", {declaration, ";"}, end-declaration, "}"
end-declaration = "" | declaration


<block> =:: "{" <declaration-list> "}"
<declaration-list> =:: "" | <declaration> | <declaration> ";" <declaration-list>

To handle nulls, figure out what terminals may come after the null, and have your parser recognize those. For recursive descent (in not-quite-java):

/*
<block> =:: "{" <declaration-list> "}"
<declaration-list> =:: <declaration> | <declaration> ";" <declaration-list>
<declaration> =:: "" | ...
*/
Node block(tokens) {
    terminal(BLOCKBEGIN, tokens);
    Node decls = declList(tokens);
    terminal(BLOCKEND, tokens);
}
Node declList(tokens) {
    Node dcl = decl(tokens);
    if (expect(LINESEP, tokens)) {
        terminal(LINESEP, tokens);
        return new DeclarationList(dcl, declList(tokens));
    } else {
        return new DeclarationList(dcl);
    }
}
Node decl(tokens) {
    if (expect(BLOCKEND, tokens)) {
        return new Declaration();
    }
    ...
}
/*
<block> =:: "{" <declaration-list> "}"
<declaration-list> =:: "" | <declaration> | <declaration> ";" <declaration-list>
*/
Node block(tokens) {
    terminal(BLOCKBEGIN, tokens);
    Node decls = declList(tokens);
    terminal(BLOCKEND, tokens);
}
Node declList(tokens) {
    if (expect(BLOCKEND, tokens)) {
        return new DeclarationList();
    }
    Node dcl = decl(tokens);
    if (expect(LINESEP, tokens)) {
        terminal(LINESEP, tokens);
        return new DeclarationList(dcl, declList(tokens));
    } else {
        return new DeclarationList(dcl);
    } 
}

For a top-down parser, the process is a little more explicit. When constructing the FOLLOWS relation, you recursively replace nulls with whatever follows the non-terminal that generated the null.

A → B C
B → b
C → D E
D → d | ""
E → e

FOLLOWS(B) ← FIRST(C) = FIRST(D) = {d, ""}
          += - {""} + FOLLOWS(D) = - {""} + FIRST(E) = - {""} + {e}
FOLLOWS(B) = {d, e}

You then fill out the parsing table as normal.

like image 34
outis Avatar answered Nov 11 '22 03:11

outis