Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appropriate uses for yacc/byacc/bison and lex/flex

Most of the posts that I read pertaining to these utilities usually suggest using some other method to obtain the same effect. For example, questions mentioning these tools usual have at least one answer containing some of the following:

  • Use the boost library (insert appropriate boost library here)
  • Don't create a DSL use (insert favorite scripting language here)
  • Antlr is better

Assuming the developer ...

  • ... is comfortable with the C language
  • ... does know at least one scripting language (e.g., Python, Perl, etc.)
  • ... must write some parsing code in almost every project worked on

So my questions are:

  • What are appropriate situations which are well suited for these utilities?
  • Are there any (reasonable) situations where there is not a better alternative to a problem than yacc and lex (or derivatives)?
  • How often in actual parsing problems can one expect to run into any short comings in yacc and lex which are better addressed by more recent solutions?
  • For a developer which is not already familiar with these tools is it worth it for them to invest time in learning their syntax/idioms? How do these compare with other solutions?
like image 643
bpw1621 Avatar asked Mar 10 '10 03:03

bpw1621


2 Answers

The reasons why lex/yacc and derivatives seem so ubiquitous today are that they have been around for much longer than other tools, that they have far more coverage in the literature and that they traditionally came with Unix operating systems. It has very little to do with how they compare to other lexer and parser generator tools.

No matter which tool you pick, there is always going to be a significant learning curve. So once you have used a given tool a few times and become relatively comfortable in its use, you are unlikely to want to incur the extra effort of learning another tool. That's only natural.

Also, in the late 1960s and early 1970s when lex/yacc were created, hardware limitations posed a serious challenge to parsing. The table driven LR parsing method used by Yacc was the most suitable at the time because it could be implemented with a small memory footprint by using a relatively small general program logic and by keeping state in files on tape or disk. Code driven parsing methods such as LL had a larger minimum memory footprint because the parser program's code itself represents the grammar and therefore it needs to fit entirely into RAM to execute and it keeps state on the stack in RAM.

When memory became more plentiful a lot more research went into different parsing methods such as LL and PEG and how to build tools using those methods. This means that many of the alternative tools that have been created after the lex/yacc family use different types of grammars. However, switching grammar types also incurs a significant learning curve. Once you are familiar with one type of grammar, for example LR or LALR grammars, you are less likely to want to switch to a tool that uses a different type of grammar, for example LL grammars.

Overall, the lex/yacc family of tools is generally more rudimentary than more recent arrivals which often have sophisticated user interfaces to graphically visualise grammars and grammar conflicts or even resolve conflicts through automatic refactoring.

So, if you have no prior experience with any parser tools, if you have to learn a new tool anyway, then you should probably look at other factors such as graphical visualisation of grammars and conflicts, auto-refactoring, availability of good documentation, languages in which the generated lexers/parsers can be output etc etc. Don't pick any tool simply because "this is what everybody else seems to be using".

Here are some reasons I could think of for using lex/yacc or flex/bison :

  • the developer is already familiar with lex/yacc or flex/bison
  • the developer is most familiar and comfortable with LR/LALR grammars
  • the developer has plenty of books covering lex/yacc but no books covering others
  • the developer has a prospective job offer coming up and has been told that lex/yacc skills would increase his chances to get hired
  • the developer could not get buy-in from project members/stake holders for the use of other tools
  • the environment has lex/yacc installed and for some reason it is not feasible to install other tools
like image 103
Jim Barker Avatar answered Sep 20 '22 18:09

Jim Barker


Whether it's worth learning these tools or not will depend heavily (almost entirely on how much parsing code you write, or how interested you are in writing more code on that general order. I've used them quite a bit, and find them extremely useful.

The tool you use doesn't really make as much difference as many would have you believe. For about 95% of the inputs I've had to deal with, there's little enough difference between one and another that the best choice is simply the one with which I'm most familiar and comfortable.

Of course, lex and yacc produce (and demand that you write your actions in) C (or C++). If you're not comfortable with them, a tool that uses and produces a language you prefer (e.g. Python or Java) will undoubtedly be a much better choice. I, for one, would not advise trying to use a tool like this with a language with which you're unfamiliar or uncomfortable. In particular, if you write code in an action that produces a compiler error, you'll probably get considerably less help from the compiler than usual in tracking down the problem, so you really need to be familiar enough with the language to recognize the problem with only a minimal hint about where compiler noticed something being wrong.

like image 30
Jerry Coffin Avatar answered Sep 21 '22 18:09

Jerry Coffin