I want to come up with a language syntax. I have read a bit about these three, and can't really see anything that one can do that another can't. Is there any reason to use one over another? Or is it just a matter of preference?
Just for the record: EBNF is not more powerful than BNF in terms of what languages it can define, just more convenient.
Advantages over BNF The BNF uses the symbols (<, >, |, ::=) for itself, but does not include quotes around terminal strings. This prevents these characters from being used in the languages, and requires a special symbol for the empty string. In EBNF, terminals are strictly enclosed within quotation marks (“…” or '…').
In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programming language.
You have to think about EBNF and ABNF as extensions that help you just to be more concise and expressive while developing your grammars.
For example think about an optional non-terminal symbol, in a BNF grammar you would define it by using intermediate symbols like:
A ::= OPTIONAL OTHER OPTIONAL ::= opt_part | epsilon
while with EBNF you can do it directly using optional syntax:
A ::= [opt_part] OTHER
Then since there's no way to express precedence in a BNF you have to use always intermediate symbols also for nested choices:
BNF A ::= B C B ::= a | b | c EBNF A ::= (a | b | c) C
This is true for many syntax issues that are allowed in an EBNF or ABNF grammar, thanks to syntactic sugar but not with a normal BNF. ABNF extends EBNF, allowing you to do more complicated things, like specifying how many occurrence of a symbol can be found together (i.e. 4*DIGIT
)
So choosing an ABNF or an EBNF as language of choice for your grammar will make your work easier, since you will be more expressive without filling you grammar with useless symbols that will be generated anyway by your parser generator, but you won't care about them!
According to Wikipedia, ABNF's double quoted string literals are case-insensitive, and case-sensitive matches must be defined as numeric ASCII values. I consider that a disadvantage.
Literal text is specified through the use of a string enclosed in quotation marks (
"
). These strings are case-insensitive and the character set used is (US-)ASCII. Therefore the string “abc” will match “abc”, “Abc”, “aBc”, “abC”, “ABc”, “AbC”, “aBC”, and “ABC”. For a case-sensitive match the explicit characters must be defined: to match “aBc” the definition will be%d97.66.99
.
https://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form#Terminal_values
However, RFC 7405 seems to add case-sensitive string literals to ABNF.
https://www.rfc-editor.org/rfc/rfc7405
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With