Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python script to print all function definitions of a C/C++ file

I want a python script to print list of all functions defined in a C/C++ file.

e.g. abc.c defines two functions as:

void func1() { }
int func2(int i) { printf("%d", i); return 1; }

I just want to search the file (abc.c) and print all the functions defined in it (function names only). In the example above, I would like to print func1, func2 using python script.

like image 740
anonymous Avatar asked Oct 25 '09 12:10

anonymous


4 Answers

I would suggest using the PLY lex/yacc tool. There's a prebuilt C parser, and the parser itself is quite fast. Once you have the file parsed, it shouldn't be too hard to find all of the functions.

http://www.dabeaz.com/ply/

like image 114
pavpanchekha Avatar answered Nov 11 '22 06:11

pavpanchekha


antlr is your tool

like image 31
Tzury Bar Yochay Avatar answered Nov 11 '22 05:11

Tzury Bar Yochay


To do this reliably, you'd need to parse the C or C++ code, and then grab the function definitions from the AST the parser produces.

C is fairly easy to parse. As pavpanchekha mentions, the parser PLY comes with a C parser, and has been used to make the following relevant projects:

  • pycparser
  • CppHeaderParser

Parsing C++ code is more complicated.. "Is there a good Python library that can parse C++" should be of help:

C++ is notoriously hard to parse. Most people who try to do this properly end up taking apart a compiler. In fact this is (in part) why LLVM started: Apple needed a way they could parse C++ for use in XCode that matched the way the compiler parsed it.

That's why there are projects like GCC_XML which you could combine with a python xml library.

Finally, if your code doesn't need to be robust at all, you could run the code though a code-reformatter, like indent (for C code) to even things out, then use regular expressions to match the function definition. Yes this is a bad, hacky, error-prone idea, and you'll probably find function definitions in multiline comments and such, but it might work well enough..

like image 25
dbr Avatar answered Nov 11 '22 04:11

dbr


This page, Parsing C++, mentions a couple of ANTLR grammars for C++. Since ANTLR has a Python API this seems like a reasonable way to proceed.

Even though parsing may seem a lot more complex that regular expressions, this is a case where someone else has done almost all the work for you and you just need to interface to it from Python.

Another alternative, where someone else has done the work of parsing C++ for you, is pygccxml which leverages GCCXML, an output extension for GCC to produce XML from the compilers internal representation. Since Python has great XML support, you just need to extract the information of interest to you.

like image 1
Michael Dillon Avatar answered Nov 11 '22 04:11

Michael Dillon