Recently I've been studying parsers and grammars and how they work. I was reading over the formal grammar for JSON at http://www.ietf.org/rfc/rfc4627.txt
, which uses EBNF. I was pretty confident in my understanding of BNF and EBNF, but apparently I still don't fully understand it. The RFC defines a JSON object like this:
object = begin-object [ member *( value-separator member ) ]
end-object
I understand that the intent here is to express that any JSON object can (optionally) have a member, and then be followed by 0 or more (value-separator, member) pairs. What I don't understand is why the asterisk appears before the (value-separator member)
. Isn't the asterisk supposed to mimic regex, so that it appears after the item to be repeated 0 or more times? Shouldn't the JSON object grammar be written like this:
object = begin-object [ member ( value-separator member )* ]
end-object
Two symbols added to EBNF that do not exist in CNF are the square brackets ([]) and curly braces ({}). The square brackets are used to denote zero or one occurrence of an expansion, and curly braces are used to denote an arbitrary, but at least one, number of expansions.
In EBNF, curly braces indicate that the expression may be repeated zero or more times.
The EBNF defines production rules where sequences of symbols are respectively assigned to a nonterminal: digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; digit = "0" | digit excluding zero ; This production rule defines the nonterminal digit which is on the left side of the assignment.
Optional (Zero or One Time) In the standard EBNF optional elements are represented inside square brackets.
In the mentioned document, http://www.ietf.org/rfc/rfc4627.txt, it is stated that
The grammatical rules in this document are to be interpreted as described in [RFC4234].
RFC4234 describes ABNF (Augmented BNF), not EBNF. If you look through this document, you will find the following definition:
3.6. Variable Repetition: *Rule
The operator "*" preceding an element indicates repetition. The full
form is:
<a>*<b>element
where <a> and <b> are optional decimal values, indicating at least
<a> and at most <b> occurrences of the element.
Default values are 0 and infinity so that *<element> allows any
number, including zero; 1*<element> requires at least one;
3*3<element> allows exactly 3 and 1*2<element> allows one or two.
So, notation
*( value-separator member )
is correct according to ABNF definition, and allows any number of repetitions, including zero.
Syntax is about the way somebody chooses to write down concrete entities to represent something.
I'll agree that puttting Kleene star before the entity to repeated is non-standard, and the authors choice to do that simply confuses people that are used to convention. But it is perfectly valid; the authors get to define what syntax means, and you, the user of the standard, just get to accept it.
There's some argument for putting the Kleene star where he did; it indicates that there is list following at a point where you might expect a list. The suffix-style Kleene star indicates the same, but it is sort of a surprise; first you read the list element (from left to right), then you discover the star.
As a practical matter, the surprise factor of post-Kleene-star isn't enough in general to outweigh the surprise factor of violating convention. But the authors of that standard made their choice.
Welcome to syntax.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With