Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any free parser generators that generate C++ code and handle Unicode correctly?

After asking this question, I'm now sold on trying to use a parser generator, where before I was going to write things manually.

However, I can't seem to find any such parser that generates C++ code, nor can I find a parser that correctly handles Unicode. (note that my input is in UCS-2 -- I don't care about supporting bits outside of the Basic Multilingual Plane if that makes building the parser more difficult)

There are some parsers which can generate C, but such parsers all seem to throw exception safety out the window, which would prevent me from using C++ inside any semantic actions.

Does a parser generator exist which meets these two tenets, or am I stuck doing everything by hand?

EDIT: Oh, and my project is BSL licensed, so there can't be many restrictions on use of the output of the parser generator itself.

like image 238
Billy ONeal Avatar asked Nov 30 '10 19:11

Billy ONeal


People also ask

What is the best parser generator?

Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.

What is a Lexer vs parser?

A lexer and a parser work in sequence: the lexer scans the input and produces the matching tokens, the parser then scans the tokens and produces the parsing result.

Should you use a parser generator?

A parser generator is a good tool that you should make part of your toolbox. A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar.

What is the best parser?

The most efficient parsers are LR-Parsers and LR-parsers are bit difficult to implement . You can go for recursive descent parsing technique as it is easier to implement in C.


2 Answers

There are two way in C++. Using a program, that genereates C++ files from a grammar that is written in a free form or using templates.

And you have two choice when you writing a grammar in template types. Using the boost::proto, where every operator is redefinied to build a syntax tree in boost::fusion (used in boost::spirit, boost::msm, boost::xpressive). (basic idea is here:Expression Templates) or building an expression tree written by hand with the help of own templates and store it directly boost::mpl containers. This thecnique is used in biscuit.

In biscuit you have

or_<>, seq_<>, char_<>, ..

templates. Biscuit is based on Yard, but extended with an extended boost::range to get a better submatch capabaility.

The Biscuit Parser Library 1

The Biscuit Parser Library 2

Yet Another Recursive Descent (YARD) parsing framework for C++

like image 50
Industrial-antidepressant Avatar answered Oct 20 '22 01:10

Industrial-antidepressant


Alright this might be a long shot but there is a parser generator (LALR) as a side project to Qt it is called QLALR it is a really thin layer, the lexing is still up to you, but all the work can be done through QStrings which support unicode. There is not a lot of functionality to it, you write the grammar with the code that does the work for each token, and it will generate the parser for you. But I have used it successfully generate a parser with ~100 rules, creating an AST of the language parsed.

like image 31
Harald Scheirich Avatar answered Oct 20 '22 01:10

Harald Scheirich