I recent added source file parsing to an existing tool that generated output files from complex command line arguments.
The command line arguments got to be so complex that we started allowing them to be supplied as a file that was parsed as if it was a very large command line, but the syntax was still awkward. So I added the ability to parse a source file using a more reasonable syntax.
I used flex 2.5.4 for windows to generate the tokenizer for this custom source file format, and it worked. But I hated the code. global variables, wierd naming convention, and the c++ code it generated was awful. The existing code generation backend was glued to the output of flex - I don't use yacc or bison.
I'm about to dive back into that code, and I'd like to use a better/more modern tool. Does anyone know of something that.
I could probably live with a tool that just generated the tokenizing tables if that was the only thing available.
Flex (fast lexical analyzer generator) is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers (also known as "scanners" or "lexers").
Ragel: http://www.complang.org/ragel/ It fits most of your requirements.
The code it generates interferes very little with a program. The code is also incredibly fast, and the Ragel syntax is more flexible and readable than anything I've ever seen. It's a rock solid piece of software. It can generate a table-driven parser or a goto-driven parser.
Boost.Spirit.Qi (parser-tokenizer) or Boost.Spirit.Lex (tokenizer only). I absolutely love Qi, and Lex is not bad either, but I just tend to take Qi for my parsing needs...
The only real drawback with Qi tends to be an increase in compile time, and it is also runs slightly slower than hand-written parsing code. It is generally much faster than parsing with regex, though.
http://www.boost.org/doc/libs/1_41_0/libs/spirit/doc/html/index.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With