Compiler compiler that consumes BNF

Tags:

Is there any reason why there are no parser generators that consume straight BNF?

I'm familiar with JavaCC and Antlr, and recently came across Parse2. It seems that each has its own notation. BNF is really easy to read and the other notations aren't. BNF is unambiguous. Is there some inherent reason why I can't feed BNF to a compiler compiler and get a parse tree out?

904

asked Oct 23 '14 21:10

ccleve

2 Answers

Marpa::R2, Perl interface to Marpa, a general BNF parser, accepts straight BNF as grammar description and generates a parser for it in Perl. Here is an example taken almost literally from a BNF grammar tutorial.

<tree>  ::= '(' <list> ')'
<list>  ::= <thing> | <list> ',' <thing>
<thing> ::= <tree> | <name>             
<name>  ::= 'ant' | 'bat' | 'cow' | 'dog' | 'cat'

The full code example.

answered Sep 28 '22 20:09

rns

Is there any reason why there are no parser generators that consume straight BNF?

No, but most parser generators started out trying to do everything. It took the community a long time to figure out that sometimes less is more.

Tools that consume (some variation) of BNF have been around for a long time.

A paper describing MetaII was published back in 1963. MetaII's baseline BNF is what you expect (lefthandside plus right hand side) plus a number of now relatively standard EBNF extensions (Kleene Star and Plus, Groupting, alternatives) and non-standard embedded actions (used to generate literally syntax-directed translation output). It assumes built-in token lexers. MetaII can meta-compile its own grammar to reproduce itself exactly [the paper shows this and it is mind-blowing moment when you grok why this works], thus enabling bootstrapping to more sophisticated versions. and to compile other simple languages. A common easy extension is defining grammar rules for lexical tokens. I built a variety of implementations, and tools that used this, back in the 1970s because it was so cool and so useful.

Tree Meta added actions for building trees, and an auxiliary grammar for pattern matching trees to generate output.

When you add in all the extra EBNF and generation stuff, the resulting "BNF" for both MetaII and Tree Meta can be fairly hard for the uninitiated to read, mostly because it is dense and you have to be familiar with context. Otherwise it looks like the output of a chain printer (for those of you old to know what this is) that broke.

Most modern compiler-compilers aren't much different. YACC has extended BNF with embedded (your favorite language) code to do actions, usually used to build trees. JavaCC uses an extended BNF; with JJTree you can build ASTs. ANTLR 1/2/3 also has an extended BNF, with embedded actions for building trees. That makes them just as, er, ugly as MetaII... no progress in 40 years IMHO.

Our DMS Software Reengineering Toolkit (see my bio) uses the barest BNF you can imagine, for truly complex grammars such as IBM Enterprise COBOL, Fortran 2005 and C++14. It looks like:

  LHS = RHS1 RHS2 ...  RHSn ;

for various tokens LHS, RHS1 ... RHSn. Lists are easy: you use two rules, one for the base case, one for list extension. Alternatives are easy: code another rule. It is technically straightforward to code grammar rules for tokens simply as grammar rules whose terminals are actual characters. (DMS offers, and we typically use, a real lexer for parsing speed reasons).

Here is a DMS BNF for high school algebra (and a bit of calculus; some liberties in notation):

equation = sum ;
sum = product ;
sum = sum '+' product ;
sum = sum '-' product ;
product = term ;
product = product '*' term ;
product = product '/' term ;
term = primary ;
term = primary '^' term ;
primary = NUMBER ;
primary = NAME ;
primary = '(' sum ')' ;
primary = '-' primary ;
primary = NAME '(' sum ')' ;
primary = 'D' NAME ':' primary ; -- differentiate primary
primary = 'I' sum 'D' NAME ; -- indefinite integral
primary = 'I' sum ',' sum ':' sum 'D' NAME ; -- definite integral

A real DMS grammar file has other things it for describing prettyprinting etc. Here you can see a bigger example of a DMS grammar for M6809 assembly code.

What is interesting is that DMS builds ASTs using only the grammar as a guide; no additional tree-building actions. By avoiding the need to specify actions-while-parsing (to build tree nodes), the resulting grammars are pretty simple to read.

DMS has been doing this since about 1998; it is the first tool I know that took this approach. The ANTLR guys finally figured out this was a good idea, and now ANTLR4 as of 2013 will build a tree without any explicit embedded actions, although it still has actions for other purposes. DMS's trees can be either concrete (following the grammar directly) or "abstract" (many tree nodes are dropped as they can be reconstructed by DMS on demand, having the abstract tree and the grammar). The abstract trees are actually pretty decent. I don't know what ANTLR4 does here.

The really nice thing about this approach is one can write and revise really complicated, big, grammars by simply revising the rules; the construction of ASTs is "free" and it is always "right" with respect to the grammar. That allows DMS to provide a variety of other related tools (prettyprinting, source-to-source level transformation) using that as baseline. (DMS is technically a "meta" compiler in the sense that it can parse its own grammars using its own grammar; we use this capability of DMS to generate the parsers implied by those grammars).

You can see an full example of this at "algebra as a dms domain", also via my bio. (The BNF was taken from there). I'd provide links, but many SO people dislike that.

answered Sep 28 '22 20:09

Ira Baxter

Related questions
                            
                                How to convert scala.collection.Set to java.util.Set with serializable within an RDD
                            
                                Parsing YANG Model in java
                            
                                How to prevent ctrl+c killing spawned processes in Java
                            
                                Regular expression for items listed in plain english
                            
                                Passing strings from java to c++ using JNI
                            
                                Groovy SSL signature verification: signature length not correct?
                            
                                This convoluted generics pattern crashes Eclipse - can I make it work?
                            
                                Java JBoss 401 Error on SharePoint 2010 Web Service
                            
                                What are Entry Points in IntelliJ code inspection and do I need to "fix" them?
                            
                                ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console
                            
                                GCM - Could not post JSON requests to GCM after 6 attempts
                            
                                How to tell Jackson to deserialize "null" string to null literal?
                            
                                Type hierarchy only from JDK
                            
                                Why am I getting 400 bad request with AngularJs post?
                            
                                Canonical representation of a BigDecimal
                            
                                Solr: Scoring exact matches higher than partial matches
                            
                                Issue downloading report exported to PPTX when deployed on server
                            
                                Does Hibernate automatically restart transactions upon deadlocking?
                            
                                Is it possible to disable SSLv3 for all Java applications?
                            
                                How to force Java 8 wsimport command-line to generate 1.6 target compatible code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compiler compiler that consumes BNF

Tags:

java

compiler-construction

bnf

parser-generator

ccleve

People also ask

2 Answers

rns

Ira Baxter

Recent Activity

Donate For Us