I'm trying to automatically resolve typedefs in arbitrary C++ or C projects.
Because some of the typedefs are defined in system header files (for example uint32
), I'm currently trying to achieve this by running the gcc preprocessor on my code files and then scanning the preprocessed files for typedefs. I should then be able to replace the typedefs in the project's code files.
I'm wondering, if there is another, perhaps simpler way, I'm missing. Can you think of one?
The reason, why I want to do this: I'm extracting code metrics from the C/C++ projects with different tools. The metrics are method-based. After extracting the metrics, I have to merge the data, that is produced by the different tools. The problem is, that one of the tools resolves typedefs and others don't. If there are typedefs used for the parameter types of methods, I have metrics mapped to different method-names, which are actually referring to the same method in the source code.
Think of this method in the source code: int test(uint32 par1, int par2)
After running my tools I have metrics, mapped to a method named int test(uint32 par1, int par2)
and some of my metrics are mapped to int test(unsigned int par1, int par2)
.
If you do not care about figuring out where they are defined, you can use objdump
to dump the C++ symbol table which resolves typedefs.
lorien$ objdump --demangle --syms foo
foo: file format mach-o-i386
SYMBOL TABLE:
00001a24 g 1e SECT 01 0000 .text dyld_stub_binding_helper
00001a38 g 1e SECT 01 0000 .text _dyld_func_lookup
...
00001c7c g 0f SECT 01 0080 .text foo::foo(char const*)
...
This snippet is from the following structure definition:
typedef char const* c_string;
struct foo {
typedef c_string ntcstring;
foo(ntcstring s): buf(s) {}
std::string buf;
};
This does require that you compile everything and it will only show symbols in the resulting executable so there are a few limitations. The other option is to have the linker dump a symbol map. For GNU tools add -Wl,-map
and -Wl,name
where name
is the name of the file to generate (see note). This approach does not demangle the names, but with a little work you can reverse engineer the compiler's mangling conventions. The output from the previous snippet will include something like:
0x00001CBE 0x0000005E [ 2] __ZN3fooC2EPKc
0x00001D1C 0x0000001A [ 2] __ZN3fooC1EPKc
You can decode these using the C++ ABI specification. Once you get comfortable with how this works, the mangling table included with the ABI becomes priceless. The derivation in this case is:
<mangled-name> ::= '_Z' <encoding>
<encoding> ::= <name> <bare-function-type>
<name> ::= <nested-name>
<nested-name> ::= 'N' <source-name> <ctor-dtor-name> 'E'
<source-name> ::= <number> <identifier>
<ctor-dtor-name> ::= 'C2' # base object constructor
<bare-function-type> ::= <type>+
<type> ::= 'P' <type> # pointer to
<type> ::= <cv-qualifier> <type>
<cv-qualifier> ::= 'K' # constant
<type> ::= 'c' # character
Note: it looks like GNU changes the arguments to ld
so you may want to check your local manual (man ld
) to make sure that the map file generation commands are -map
filename
in your version. In recent versions, use -Wl,-M
and redirect stdout to a file.
You can use Clang (the LLVM C/C++ compiler front-end) to parse code in a way that preserves information on typedefs and even macros. It has a very nice C++ API for reading the data after the source code is read into the AST (abstract syntax tree). http://clang.llvm.org/
If you are instead looking for a simple program that already does the resolving for you (instead of the Clang programming API), I think you are out of luck, as I have never seen such a thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With