Why some compilers prefer hand-crafted parser over parser generators?

Tags:

According to Vala documentation: "Before 0.3.1, Vala's parser was the classic flex scanner and Bison LALR parser combination. But as of commit eba85a, the parser is a hand-crafted recursive descent parser." My question is: Why?

The question could be addressed to any compiler which isn't using parser generator. What are pros and cons for such a move from parser generator to hand-crafted parser? What are disadvantages of using parser generators (Bison, ANTLR) for compilers?

As a side comment: I'm interested in Vala specifically because I like the idea of having language with modern features and clean syntax but compilable into "native" and "unmanaged" high-level language (C in case of Vala). I have found only Vala so far. I'm thinking of having fun by making Vala (or similar language) compilable to C++ (backed by Qt libs). But since I don't want to invent completely new language I'm thinking of taking some existing grammar. Obviously hand-crafted parsers don't have written formal grammar I might reuse. Your comments on this idea are welcome (is the whole idea silly?).

584

asked Mar 28 '13 02:03

vladimir

2 Answers

I have written half a dozen hand crafted parsers (in most cases recursive descent parser AKA top-down parser) in my career and have seen parsers generated by parser generators and I must admit I am biased against parser generators.

Here are some pros and cons for each approach.

Parser generators

Pros:

Quickly get a working parser (at least if you do not know how to hand code it).

Cons:

Generated code is hard to understand and debug.
Difficult to implement proper error handling. The generator will create a correct parser for syntactically correct code but will choke on incorrect code and in most cases will not be able to provide proper error messages.
A bug in parser generator may halt your project. You need to fix the bug in somebody else's code (if source code is available), wait for the author to fix it or workaround the bug (if possible at all).

Hand crafted recursive descent parser

Pros:

Generated code is easy to understand. Recursive parsers usually have one function corresponding to each language construct, e.g. parseWhile to parse a 'while' statement, parseDeclaration to parse a declaration and so on. Understanding and debugging the parser is easy.
It is easy to provide meaningful error messages, to recover from errors and continue parsing in the way that makes most sense in a particular situation.

Cons:

It will take some time to hand code the parser especially if you do not have an experience with this stuff.
The parser may be somewhat slow. This applies to all recursive parsers not just hand written ones. Having one function corresponding to each language construct to parse a simple numeric literal the parser may make a dozen or more nested calls starting from e.g. parseExpression through parseAddition, parseMultiplication, etc. parseLiteral. Function calls are relatively inexpensive in a language like C but still cam sum up to a significant time.

One solution to speedup a recursive parser is to replace parts of your recursive parser by a bottom-up sub-parser which often is much faster. The natural candidates for such sub-parser are the expressions which have almost uniform syntax (i.e. binary and unary expressions) with several precedence levels. The bottom-up parser for an expression is usually also simple to hand code, it is often just one loop getting input tokens from the lexer, a stack of values and a lookup table of operator precedence’s for operator tokens.

159

answered Oct 21 '22 07:10

TN.

LR(1) and LALR(1) parsers are really, really annoying for two reasons:

The parser generator isn't very good at producing useful error messages.
Certain kinds of ambiguity, like C-style if-else blocks, make writing the grammar very painful.

On the other hand, LL(1) grammar are much better at both of these things. The structure of LL(1) grammars makes them very easy to encode as recursive descent parsers, and so dealing with a parser generator is not really a win.

Also, in the case of Vala, the parser and the compiler itself as presented as a library, so you can build a custom backend for the Vala compiler using the Vala compiler library and get all the parsing and type checking and such for free.

answered Oct 21 '22 08:10

apmasell

Related questions
                            
                                Parse data from JSON in ReactJS
                            
                                Explanation about viable prefix
                            
                                How can I use NLP to parse recipe ingredients?
                            
                                Does Python have a module for parsing HTTP requests and responses?
                            
                                How to parse netstat command in order to get process name and PID from it?
                            
                                How to parse an HTML string in Google Apps Script without using XmlService? [duplicate]
                            
                                Remove tab ('\t') from string javascript
                            
                                Parsing PHP Doc Comments into a Data Structure
                            
                                jquery ajax call return JSON parsing error
                            
                                Parse string in HH.mm format to TimeSpan
                            
                                How can I split a text into sentences using the Stanford parser?
                            
                                Parse HTML via XPath [closed]
                            
                                How to parse a RFC 2822 date/time into a Python datetime?
                            
                                Fastest way to retrieve a <title> in PHP
                            
                                Parsing Twitter API Datestamp
                            
                                How to parse a text file with C#
                            
                                How do Java, C++, C#, etc. get around this particular syntactic ambiguity with < and >?
                            
                                Is there a lightweight multipart/form-data parser in C or C++? [closed]
                            
                                How to hide API keys in GitHub for iOS (SWIFT) projects?
                            
                                Golang parse JSON array into data structure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why some compilers prefer hand-crafted parser over parser generators?

Tags:

parsing

compiler-construction

vala

vladimir

People also ask

2 Answers

TN.

apmasell

Recent Activity

Donate For Us