Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ what is the advantage of lex and bison to a selfmade tokenizer / parser

I would like to do some parsing and tokenizing in c++ for learning purposes. Now I often times came across bison/yacc and lex when reading about this subject online. Would there be any mayor benefit of using those over for instance a tokenizer/parser written using STL or boost::regex or maybe even just C?

like image 304
moka Avatar asked Jul 13 '10 14:07

moka


People also ask

What is Flex and Bison used for?

Flex and Bison are tools for building programs that handle structured input. They were originally tools for building compilers, but they have proven to be useful in many other areas.

What is bison parser?

Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR (1) parser tables. As an experimental feature, Bison can also generate IELR (1) or canonical LR(1) parser tables.

Does GCC use Lex and Yacc?

Are all Parsers made with yacc or bison (and lex/flex)? No. For example, GCC don't use them.


2 Answers

I recently undertook writing a simple lexer and parser.

It turned out that the lexer was simpler to code by hand. But the parser was a little more difficult. My Bison-generated parser worked almost right off the bat, and it gave me a lot of helpful messages about where I had forgotten about states. I later wrote the same parser by hand but it took a lot more debugging before I had it working perfectly.

The appeal of generating tools for lexers and parsers is that you can write the specification in a clean, easy-to-read language that comes close to being a shortest-possible rendition of your spec. A hand-written parser is usually at least twice as big. Also, the automated parser (/lexer) comes with a lot of diagnostic code and logic to help you get the thing debugged.

A parser/lexer spec in BNF-like language is also a lot easier to change, should your language or requirements change. If you're dealing with a hand-written parser/lexer, you may need to dig deeply into your code and make significant changes.

Finally, because they're often implemented as finite state machines without backtracking (gazillions of options on Bison, so this is not always a given), it's quite possible that your auto-generated code will be more efficient than your hand-coded product.

like image 139
Carl Smotricz Avatar answered Sep 20 '22 02:09

Carl Smotricz


Somebody else has already written and DEBUGGED them for you?

like image 30
Martin Beckett Avatar answered Sep 21 '22 02:09

Martin Beckett