Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What libraries are available for parsing c++ to extract type information

Tags:

c++

types

parsing

I'm looking for a way to parse c++ code to retrieve some basic information about classes. I don't actually need much information from the code itself, but I do need it to handle things like macros and templates. In short, I want to extract the "structure" of the code, what you would show in a UML diagram.

For each class/struct/union/enum/typedef in the code base, all I need (after templates & macros have been handled) is:

  • Their name
  • The namespace in which they live
  • The fields contained within (name of type, name of field and access restrictions, such as private/mutable/etc)
  • Functions contained within (return type, name, parameters)
  • The declaring file
  • Line/column numbers (or byte offset in file) where the definition of this data begins

The actual instructions in the code are irrelevant for my purposes.

I'm anticipating a lot of people saying I should just use a regex for this (or even Flex & Bison), but these aren't really valid, as I do need the preprocessor and template stuff handled properly.

like image 472
Grant Peters Avatar asked Oct 05 '09 15:10

Grant Peters


3 Answers

Sounds like a job for gcc-xml in combination with the c++ xml-library or xml-friendly scripting language of your choice.

like image 133
Georg Fritzsche Avatar answered Nov 01 '22 01:11

Georg Fritzsche


  • Elsa: The Elkhound-based C/C++ Parser,
  • clang: a C language family frontend for LLVM/Clang Static Analyzer,
  • ANTLR Parser Generator Grammar List (search for C++, there is more than one grammar),
  • OpenC++ (adds reflection capabilities to C++),
  • Stratego XT (full programs transformation - see CodeBoost, which for parsing uses OpenC++ just mentioned, for an example application to C++ programs),
  • Parsing C++ at nobugs.org (not a parser but interesting bits of information; in particular Edward D. Willink's "Meta-Compilation for C++" PhD thesis and Mike Dimmick overview of his attempt to parse C++).

See also Ira Baxter here, where he cites his own product.

Warning: mind you, only Elsa "..I hear does a fairly good job.." at constructing a symbol table, which according to Ira Baxter is necessary for OP's original intent (see comments to this answer - I quote him because he is an expert in the field).

like image 4
MaD70 Avatar answered Nov 01 '22 02:11

MaD70


Running Doxygen on the code would give you most of that, wouldn't it?

In what format do you want the output?

like image 4
John Carter Avatar answered Nov 01 '22 03:11

John Carter