Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there command-line tool to extract typedef, structure, enumeration, variable, function from a C or C++ file?

I am desiring a command-line tool to extract a definition or declaration (typedef, structure, enumeration, variable, or function) from a C or C++ source file. Also a way to replace an existing definition/declaration would be handy (after transforming the extracted definition by a user-submitted script). Is there such generic tool available, or is some resonably close approximation of such a tool?

Scriptability and ability to hook-up with user created scripts or programs is of importance here, although I am academically curious of GUI programs too. Open source solutions for Unix/Linux camp are preferred (although I am curious of Windows and OS X tools too). Primary language interests are C and C++ but more generic solution would be even better (I think we do not need super accurate parsing capabilities for finding, extracting and replacing a definition in a program source file).

Sample Use Cases (extra - for the curious mind):

  1. Given deeply nested structs and variable (array) initializations of these types, suppose there is a need to change a struct definition by adding or reordering fields or rewriting the variable/array definitions in more readable format without introducing errors resulting from manual labor. This would work by extracting the old initializations, then using a script/program to write the new initializations to replace the old ones.
  2. For implementing a code browsing tool - extract a definition.
  3. Decorative code generation (e.g. logging function entries / returns).
  4. Scripted code structuring (e.g. extract this and that thing and put in different place without change - version control commit comment could document the command to perform this operation to make it evident and verifiable that nothing changed).

Alternative problem: If there is a tool to tell the location of the definition (beginning and end line would suffice - we could even assume all the definitions/declarations we are interested in are in their own line), then it would a simply exercise of finger dexterity to to write a program to

  1. extract definitions,
  2. replace definitions, or even
  3. extract a definition, run a program specified by command line options (or an editor) to

    • receive the desired extracted definitions from stdin (or from a temporary file),
    • perform the transformation (editing), and
    • output the new definitions to stdout (or save them to the given temporary file)

    to be replaced by the executing program.

So the major, more challenging problem would be finding the begin and end line of the definition.

Note about tags: More accurate tag than code-generation would be code-transformation but it does not exist.

like image 610
FooF Avatar asked Jun 27 '12 02:06

FooF


1 Answers

Our DMS Software Reengineering Toolkit is trying to be the tool you are wishing for. But it is pushing the state of the art and isn't a nirvana style tool. It is good enough to do real, interesting work.

DMS provides general facilities for parsing, analyzing and transforming source code.

It uses explicit grammars to define languages (such as C and C++); the grammars drive parsers that build abstract syntax trees (ASTs). A variety of analysis primitives provide a) facilities ["attribute grammars" ATGs] for collecting information along tree-like information flow paths which match the shape of ASTs nicely, b) construction of symbol use to symbol definition maps ["symbol tables"], c) control and data flow analysis using facts extracted by ATGs, d) range analysis, e) points-to analysis both local and global. These primitive analyzers can be used to compose facts from the AST to draw conclusions about the code represented by the ASTs (e.g., "this statement modifies these variables"). A langauge front end packages the grammar and the language-specific analyzers together in a reusable bundle. DMS has such language front ends of varying levels of depth and maturity for a wide variety of languages.

[EDIT 6/27: The C and C++ front ends have support for specific dialects of C and C++: ANSIC, C99, GCC3/4 C, MS Visual C, ANSI C++98, ANSI C++11, GCC3/4 C++, MS Visual C++ 2005/2008/2010. If you want accurate analysis of code, you should use the "right" dialect to process your code.]

But "analysis" isn't the point. The purpose of analysis is to drive change. DMS provides additional support to procedurally modify the ASTs, to modify the ASTs by source-to-source rewrite rules written in the surface syntax of the language (both conditioned by some chosen analysis result), or to group sets of procedural and source-to-source rewrites together to make compound, complex rewrites that can carry off massive code changes such are re-architecting, etc. After the ASTs are transformed, they can be used to regenerate ("prettyprint") syntactically correct code in the corresponding front-end language/dialect. [By modifying an AST for one language piecewise until you have an AST for another, you can build translators, but this isn't as easy as this sentence implies].

This all works to considerable degree, yet is still somewhat stymied by certain language complications. For C and C++, a famous complication is the preprocessor; by editing the program text arbitrarily, preprocessor conditionals can render the source code unparseable by anything resembling standard parsing technology. DMS's C and C++ front ends ameliorate this somewhat and can parse code with well-structured preprocessor directives including some strange cases that most people would not call structured but that commonly occur:

   #IF  cond
        if (abc)  {
   #ELSE
        if (def)  {
   #ENDIF

We are making interesting progress on parsing code with arbitrary placement of preprocessor conditionals. But once you do that, now all of your analyzers suddenly have to take the preprocessor conditionals into account and we're all suddenly on turf the compiler people have not really visited.

DMS has been used to make major architectural shifts in large C++ programs, converting from non-CORBA style to CORBA style with an immense amount of code shuffling, to extract code along arbitrary control flow paths to generate SOW-style APIs for existing C code, to insert instrumentation in large C programs to detect pointer errors, etc. [It has been applied to other tasks in many of those other languages].

In our own experience, it is still pretty hard to use. In our opinion, this is in the same sense that democracy is the worst of all systems of government except for all the rest; YMMV. The website has lots of DMS-derived tools and discussions.

It has in fact been used to extract functions (the SOW-exercise is much more general than that) and insert functions (this is a generalized case of instrumentation).

Tools like GCC-XML are shadows of DMS's capabilities. GCC-XML parses, builds symbol tables, and dumps data declarations (not code), but it can't make any code changes. Clang is better; it parses C and C++ to ASTs, can do analyses on the LLVM intermediate representation, and has some kind of mechanism for spitting out to-be-applied-later patches to source text inspired by a desired tree change. I don't know if Clang can carry out massive code transformations, especially those where one transformation's result is transformed again (how do you modify the tree for a delayed text patch?). DMS can do this all day long, and can do it for many languages other than C and C++, and can do it for an arbitrary mixture of the langauges it knows.

Until the preprocessor problem with conditionals gets solved, analyzing/transforming C and C++ code will not be easy. We succeed in these tasks on these languages only by sheer willpower and using the the strongest tools we can build. (Java doesn't have these problems, and DMS is correspondingly better at analyzing/transforming it).

At severe risk of hubris, I believe DMS to be the best of tools out there for general purpose analysis and transformation. As its architect, I view it as my long term job to make it ever stronger for this task.

like image 143
Ira Baxter Avatar answered Oct 07 '22 00:10

Ira Baxter