Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Boost.Spirit be theoretically/practically used to parse C++(0x) (or any other language)?

Tags:

Is it theoretically up to the task?

Can it be done practically and would the resulting parser be used with sufficient performance and output (say, LLVM IR or GCC's gimple) to be integrated in a competing compiler?

like image 389
rubenvb Avatar asked May 19 '11 19:05

rubenvb


2 Answers

I'm sorry. I talked to its author, and he said he won't make it parse C++ fully, but admits that he accepts it to parse certain constructs as ambiguous.

So this is not an answer anymore!!


I recommend you to have a look at scalpel. From its homepage

Scalpel stands for source code analysis, libre and portable library. This is a C++ library which aims to perform full syntax and semantic analysis of any given C++ program.

And

What makes me think Scalpel could be accepted into Boost

Scalpel uses itself several Boost libraries: Spirit, Wave, shared_ptr (now in C++0x's STL), Optional, Test, etc.. Actually, it exclusively uses Boost libraries and the C++ standard library, which is required by Boost.

Besides, Boost already provides a Spirit-based C++ source code preprocessing library: Wave. Including a C++ source code analysis library seems to be a natural evolution.

like image 158
2 revs Avatar answered Sep 27 '22 19:09

2 revs


No. C++ is too hard to parse for most automatic tools, and in practice usually is parsed by hand written parsers. [Edit 1-Mar-2015: Added 'most' and 'usually'.]

Among the hard problems are:

  • A * B; which could be either the definition of a variable B with type A* or just the multiplication of two variables A and B.
  • A < B > C > D Where does the template A<> end? The usual 'max-munch' rules for parsing expressions will not work here.
  • vector<shared_ptr<int>> where the >> ends two templates, which is hard to do with only one token (and a space in between is allowed). But in 1>>15 no space is allowed.

And I bet that this list is far from complete.

Addition: The grammar is available, but is ambiguous and thus not valid as input to tools like Spirit.

Update 1-Mar-2015: As Ira Baxter, a well known expert in this field, points out in the comments, there are some parser generators that can generate a parser that will generate the full parser forest. As far as I know, selecting the right parse still requires a semantic phase. I'm not aware of any non-commercial parser generators that can do so for C++'s grammar. For more information, see this answer.

like image 44
Sjoerd Avatar answered Sep 27 '22 17:09

Sjoerd