Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Question about EBNF notation and JSON

Recently I've been studying parsers and grammars and how they work. I was reading over the formal grammar for JSON at http://www.ietf.org/rfc/rfc4627.txt, which uses EBNF. I was pretty confident in my understanding of BNF and EBNF, but apparently I still don't fully understand it. The RFC defines a JSON object like this:

  object = begin-object [ member *( value-separator member ) ]
  end-object

I understand that the intent here is to express that any JSON object can (optionally) have a member, and then be followed by 0 or more (value-separator, member) pairs. What I don't understand is why the asterisk appears before the (value-separator member). Isn't the asterisk supposed to mimic regex, so that it appears after the item to be repeated 0 or more times? Shouldn't the JSON object grammar be written like this:

  object = begin-object [ member ( value-separator member )* ]
  end-object
like image 372
Channel72 Avatar asked Nov 07 '10 15:11

Channel72


People also ask

Which symbol type is not found in an EBNF?

Two symbols added to EBNF that do not exist in CNF are the square brackets ([]) and curly braces ({}). The square brackets are used to denote zero or one occurrence of an expansion, and curly braces are used to denote an arbitrary, but at least one, number of expansions.

What do curly brackets mean in EBNF?

In EBNF, curly braces indicate that the expression may be repeated zero or more times.

What are the rules of EBNF?

The EBNF defines production rules where sequences of symbols are respectively assigned to a nonterminal: digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; digit = "0" | digit excluding zero ; This production rule defines the nonterminal digit which is on the left side of the assignment.

What does square bracket mean in EBNF?

Optional (Zero or One Time) In the standard EBNF optional elements are represented inside square brackets.


2 Answers

In the mentioned document, http://www.ietf.org/rfc/rfc4627.txt, it is stated that

The grammatical rules in this document are to be interpreted as described in [RFC4234].

RFC4234 describes ABNF (Augmented BNF), not EBNF. If you look through this document, you will find the following definition:

3.6.  Variable Repetition:  *Rule

   The operator "*" preceding an element indicates repetition.  The full
   form is:

         <a>*<b>element

   where <a> and <b> are optional decimal values, indicating at least
   <a> and at most <b> occurrences of the element.

   Default values are 0 and infinity so that *<element> allows any
   number, including zero; 1*<element> requires at least one;
   3*3<element> allows exactly 3 and 1*2<element> allows one or two.

So, notation

*( value-separator member )

is correct according to ABNF definition, and allows any number of repetitions, including zero.

like image 76
hooke Avatar answered Nov 03 '22 20:11

hooke


Syntax is about the way somebody chooses to write down concrete entities to represent something.

I'll agree that puttting Kleene star before the entity to repeated is non-standard, and the authors choice to do that simply confuses people that are used to convention. But it is perfectly valid; the authors get to define what syntax means, and you, the user of the standard, just get to accept it.

There's some argument for putting the Kleene star where he did; it indicates that there is list following at a point where you might expect a list. The suffix-style Kleene star indicates the same, but it is sort of a surprise; first you read the list element (from left to right), then you discover the star.

As a practical matter, the surprise factor of post-Kleene-star isn't enough in general to outweigh the surprise factor of violating convention. But the authors of that standard made their choice.

Welcome to syntax.

like image 32
Ira Baxter Avatar answered Nov 03 '22 22:11

Ira Baxter