Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reasons for using lex/yacc alternatives?

About once a year I have to develop or at least design a grammar and a parser - that appears a constant of my working life.

Every time I'm facing this task, thus about once year, I, quite a lex/yacc (flex/bison resp.) guy, consider, or reconsider, alternatives for plain lex/yacc, and, after some musing and trying I get back to plain lex/yacc.

Because I have a CORBA-server at the hub of the application I can call in from from a parser written in almost every language, so this time I had a look at

  • antlr4 (Java) and antlr3 (Java but has RT for other languages),
  • SableCC (Java),
  • Parse::EBNF, Parse::Yapp and Marpa (Perl),
  • and SimpleParse (Python),

For me, the tandem antlr4 with antlrworks looked the most promising candidate, but I'm not yet convinced that the time spent spent on getting into it will be amortized in the end.


The grammar I have to develop is similar to SQL DDL (in terms of structure, not in terms of the subject).

Why would any of the alternatives would make my task easier than using plain lex/yacc?

like image 825
Solkar Avatar asked May 13 '13 15:05

Solkar


People also ask

Why do we use lex and yacc?

Lex is a lexical analysis tool that can be used to identify specific text strings in a structured way from source text. Yacc is a grammar parser; it reads text and can be used to turn a sequence of words into a structured format for processing.

Are Lex and Yacc still used?

Even though ANTLR provides more advanced features, Lex/Yacc is still the preferred choice in many university courses.

What is the difference between Lex and Yacc?

The main difference between Lex and Yacc is that Lex is a lexical analyzer which converts the source program into meaningful tokens while Yacc is a parser that generates a parse tree from the tokens generated by Lex. Generally, a compiler is a software program that converts the source code into machine code.

What is the difference between Lex and Flex?

This is primarily in the area of input lookahead; in Lex, you can provide your own input code and modify the character stream; Flex won't let you do that.


1 Answers

What you also should consider is that the various parser generators generate quite different parsers. Yacc/bison produces bottom-up parsers which are often hard to understand, hard to debug and give weird error messages. ANTLR for instance produces a recursive descent top-down parser which is much easier to understand, you can actually debug it easily, you can only use subrules for a parse operation (e.g. just parse expressions instead of the full language).

Additionally, its error recovery is way better and produces a lot cleaner errors. There are various IDEs/plugins/extensions that make working with ANTLR grammars pretty easy (ANTLRWorks, the IntelliJ plugin, the Visual Studio Code extension etc.). And you can generate parsers in different languages (C, C++, C#, Java and more) from the same grammar (unless you have language specific actions in your grammar, you mentioned this in your question already). And while we speak of actions: due to the evaluation principle in bottom parser (shift token, shift token, reduce them to a new token and shift it etc.) actions can easily cause trouble there, e.g. executing more than once and such. Not so with parsers generated by ANTLR.

I also tried various parser generators over the years, even wrote my own, but I would anytime recommend ANTLR as the tool of choice.

like image 91
Mike Lischke Avatar answered Oct 07 '22 00:10

Mike Lischke