I want to write a parser-generator for educational purposes, and was wondering if there are some nice online resources or tutorials that explain how to write one. Something on the lines of "Let's Build a Compiler" by Jack Crenshaw.
I want to write the parser generator for LR(1) grammar.
I have a decent understanding of the theory behind generating the action and goto tables, but want some resource which will help me with implementing it.
Preferred languages are C/C++, Java though even other languages are OK.
Thanks.
Antlr is a mature and widely-used parser generator for Java, and other languages as well. The remainder of this reading will get you started with Antlr.
Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
If you know exactly what language you are going to parse, writing a hand-written parser is straightforward (although laborious). If you don't know the language, then refactoring parsers can be quite difficult. You need good test cases not to break corner cases.
I agree with others, the Dragon book is good background for LR parsing.
If you are interested in recursive descent parsers, an enormously fun learning experience is this website, which walks you through building a completely self-contained compiler system that can compile itself and other languages:
MetaII Compiler Tutorial
This is all based on an amazing little 10-page technical paper by Val Schorre: META II: A Syntax-Oriented Compiler Writing Language from honest-to-god 1964. I learned how to build compilers from this back in 1970. There's a mind-blowing moment when you finally grok how the compiler can regenerate itself....
I know the website author from my college days, but have nothing to do with the website.
If you wanted to go the python route I would recommend the following.
I have found both of these to be extremely helpful and Paul McGuire the author of pyparsing is super at helping you out when you run into problems. The book Text Processing in Python is just a handy reference to have at your finger tips and helps get you into the right frame of mind when attempting to build a parser.
I would also point out that an OO language is better suited as a language parsing engine because it's extensible and polymorphism is the right way to do it (IMHO). Looking at the problem in terms of a state machine rather than "Look for a semicolon at the end of xyz" will demonstrate that your parser becomes much more robust in the end.
Hope that Helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With