Ebnf – Is this an LL(1) grammar?

Tags:

I found the following EBNF on wikipedia, describing EBNF:

letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
   | "H" | "I" | "J" | "K" | "L" | "M" | "N"
   | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
   | "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
symbol = "[" | "]" | "{" | "}" | "(" | ")" | "<" | ">"
   | "'" | '"' | "=" | "|" | "." | "," | ";" ;
character = letter | digit | symbol | "_" ;

identifier = letter , { letter | digit | "_" } ;
terminal = "'" , character , { character } , "'" 
     | '"' , character , { character } , '"' ;

lhs = identifier ;
rhs = identifier
 | terminal
 | "[" , rhs , "]"
 | "{" , rhs , "}"
 | "(" , rhs , ")"
 | rhs , "|" , rhs
 | rhs , "," , rhs ;

rule = lhs , "=" , rhs , ";" ;
grammar = { rule } ;

Now, because of my limited knowledge on parsers and grammars, I don't know if this is an LL(1) grammar. I tried to write a parser for it, but it fails when trying to read rhs, which reads itself again, which reads itself again, which oh, you got it...

Is it an LL(1) grammar?
If not, how to turn it into one (possible?)?

577

asked Nov 24 '13 13:11

tilpner

2 Answers

The quoted Wikipedia extract is not a correct EBNF grammar for EBNF. It's also not left-parseable: indeed, it is ambiguous, so it's not unambiguously parseable at all.

In general, the terms LL(k) and LR(k) (and many other such terms) apply to Context-Free Grammars (CFGs) (and, by extension, the languages generated by those grammars). EBNF is not a formalism for describing CFGs. It is designed to be a formalism to describe context-free languages and therefore it should be possible to create a CFG from a given EBNF grammar (but see Note 1), but there is not a direct correspondence between an EBNF syntax rule and a single production in a CFG.

That said, you can usually directly create a CFG by using some standard transformations. For example:

Click to copy

{ ... }

can be substituted with the generated non-terminal M'', with the addition of the following productions: (ε is the empty string)

Click to copy

M'  → ...
M'' → ε
M'' → M' M''

The above transformation does not introduce left-recursion, so it does not artificially make the original grammar non-LL(1).

The most important error in the cited grammar [Note 2] is the ambigous EBNF rule:

Click to copy

rhs = identifier
    | terminal
    | "[" , rhs , "]"
    | "{" , rhs , "}"
    | "(" , rhs , ")"
    | rhs , "|" , rhs
    | rhs , "," , rhs
    ;

It's also left-recursive, so it will not correspond to an LL(1) CFG. But more importantly, it does not indicate either the associativity or the precedence of the | and , operators. (Semantically, these operators do not have a defined associativity, but the syntax should still specify one; otherwise, it is impossible to unambiguously create a parse tree. The precedence between the two operators is important semantically.)

A better set of rules would be:

Click to copy

primary = identifier
        | terminal
        | "[" , rhs , "]"
        | "{" , rhs , "}"
        | "(" , rhs , ")"
        ;
factor  = primary , { "|" , primary } ;
rhs     = factor ,  { "," , factor } ;

This is still an oversimplification, but it covers a large number of use cases. It's neither ambiguous nor left-recursive. [3]

Notes

Syntactic constraints specified in comments may not be easy to translate into CFGs, though. For example, the ISO standard EBNF for EBNF defines the non-terminal "syntactic exception" as follows:

syntactic exception = ? a syntactic-factor that could be replaced by a syntactic-factor containing no meta-identifiers ?

The intention of the above text is to restrict an exception to be a regular language. That's important, since the set difference between two context-free languages is not necessarily context-free, while the difference between a context-free language and a regular language is provably context-free. Ironically, the "special sequence" describing this restriction cannot be expressed as a context-free grammar because it depends on the definition of meta-identifiers. (If it had said "a syntactic-factor containing no meta-identifiers" it would be easy to write without resorting to a special sequence, but clearly that was not the intent.)
There is another important error in the Wikipedia excerpt. It defines both types of quoted strings as having the same body, but that's not correct; a double-quoted string cannot include double-quote characters, and a single-quoted string cannot contain single-quote characters. So the use of the identifier character in both of those definitions is incorrect.
The formal EBNF grammar allows primary to be empty. I left that out, because it's not usually needed.

129

answered Sep 20 '22 19:09

rici

In short, no, your grammar is not LL(1).

The first reason is the left-recursion of rhs you already discovered. I assume, you wrote a recursive descent parser (or something else that bases on LL(1) grammars). Such a parser is not able to handle left-recursive rules since they cause a special case of a so-called FIRST/FIRST conflict (cf. 1).

To tackle this problem and answer the second part of your question, you can left-factor your grammar and replace left-recursion as shown in 2.

answered Sep 21 '22 19:09

ojlr

Related questions
                            
                                Get content of <script type="application/ld+json"> using PHP
                            
                                Convert String to URL in android/java [duplicate]
                            
                                golang get domain from email using parse standard library
                            
                                Counting the number of "0" in this factor
                            
                                Antlr parser operator priority
                            
                                How can I parse REXX code in Java?
                            
                                Overwrite a specific line in a text file using VB.NET
                            
                                Parsing XML data using php to put into mysql database
                            
                                How do I parsing a complex file format in Delphi? (Not CSV, XML, etc)
                            
                                PHP Version 5.2.14 / Parse error: syntax error, unexpected T_FUNCTION, expecting ')'
                            
                                How can fractional number expressions be parsed using pyparsing?
                            
                                Help parsing string (City, State Zip) with JavaScript
                            
                                Parsing pipe delimited input in awk
                            
                                Static code parser for Java source code to extract methods / comments
                            
                                How to parse XML in Python and LXML?
                            
                                write recursive Parser with pyparsing
                            
                                Need to convert result of .innerHTML to number on javascript
                            
                                Find div with class using PHP Simple HTML DOM Parser
                            
                                looping through XML file using VB.NET
                            
                                Converting a Ruby hash string to a Python dictionary

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ebnf – Is this an LL(1) grammar?

Tags:

parsing

ll

ebnf

tilpner

People also ask

2 Answers

rici

ojlr

Recent Activity

Donate For Us