Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choice of Parser Generator [closed]

Tags:

OK, I understand this question may sound quite opinion-based, however since I have several specific criteria of choice, I think it would make a nice fit for SO. So, here I am...

I've worked with compiler/interpreter construction in the past quite a lot (mostly as a hobby obviously) and for some reason I stuck with Lex/Yacc (or Flex/Bison, I'm quite confused as to how they call them now... lol).

However, since I'm finding myself currently playing with yet another hobbyist interpreter project, I thought I should give a try to something different maybe in order to avoid what I don't like about Lex/Yacc.

So, namely :

  • Better C++-friendly (than C-friendly)
  • Good documentation (preferably with some existing grammars already implemented + instructions on how to compile/use them - sounds rather obvious, huh?)
  • Could be LALR, LL(*), Recursive descent, I don't really care (note: an input as to which type you would prefer and for what type of implementations would be great though; I've never really understood their pros and cons, to be perfectly honest, even though I do know what they refer to)
  • Combining the Lexer part and the Parser grammar in one file wouldn't be bad at all; never really got it why it has to be split in two.
  • Last but not least : I've always had issues with... issues. I mean - at least as far as Lex/Yacc go, parsing error messages are more-or-less cryptic (Syntax Error... Yuhuu!) and rarely do they help diagnose an issue. (Well, unless you are the one who developed the interpreter... lol). So, is there anything better than Lex/Yacc regarding error reporting?

OK, I hope that wasn't too verbose. I'm all ears! :-)

like image 476
Dr.Kameleon Avatar asked Mar 01 '14 00:03

Dr.Kameleon


People also ask

What is the best parser generator?

Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.

How does a parser generator work?

A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar. The generated code is a parser, which takes a sequence of characters and tries to match the sequence against the grammar.

Which tool is used for parser generator?

Johnson. It is a look ahead left-to-right (LALR) parser generator, generating a parser, the part of a compiler that tries to make syntactic sense of the source code, specifically a LALR parser, based on an analytic grammar written in a notation similar to Backus naur form (BNF).


1 Answers

I've been building parser generators and parsers since 1969.

Recursive descent, YACC and JavaCC are the typical answers you hear.

These are your grandpa's parser generators, and suffer from limitations in the grammars they will accept. Invariably, (esp on Stack Overflow), some poor soul asks "how do I solve this shift/reduce" problem (for LR parser generators like YACC) or "how do I eliminate left recursion" (for recursive descent or LL parser generators like JavaCC). Worse, they can't handle grammars that really have syntactic ambiguity, as occurs in most complex languages.

GLR (and GLL) parsers allow you to write context-free grammars ... and parse them, with no fuss or muss. This is a real productivity enhancement. There's a price: you can end up with ambiguous parses but there are ways to handle that. (see this discussion of C++ parsing troubles that neither YACC nor JavaCC can handle by themselves).

Bison (widely available) has a GLR option; use it! Recent multilingual program manipulation tools seems to all use GLL or GLR. Our DMS Software Reengineering Toolkit uses GLR and parses C++ (full C++14 in MS and GNU variants!), Java, COBOL, and a raft of other complicated languages; GLR has been one of the best technical choices I've made in my career. Stratego uses GLR. I think RascalMPL uses GLL. Scott McPeak's Elkhound GLR parser generator is C++ based and generates, I'm pretty sure, C++ code (OP asked for a C++ based answer).

Hot topics these days are PEG and ANTLR4. These are better that LL or LR parsers but still give one grief in trying to shape the grammar. (With PEG, you have to order the productions, assuming you can find such an order, to handle ambiguous rules with priorities. With ANTLR4, you still have specify lookaheads to resolve ambiguity; I don't know how it handles infinite lookahead). AFAIK, nobody has built practical C++ parsers with either of these technologies, so they aren't living up to their reputation.

I think GLR and GLL are much, much better answers.

like image 155
Ira Baxter Avatar answered Oct 24 '22 21:10

Ira Baxter