Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(clang) How to parse macros themselves, getting an ast where possible?

Tags:

clang

Hi I'm using clang to extract information from c files. And I'm trying to extract the values of macros.

e.g. from this I'd want the value '13' or an ast (+ (* 3 4) 1):

#define SOME_CONSTANT 3*4+1

or from a macro function, I'd want an ast e.g. (SOME_MACROFUNC (x y) (+ (add4 x) (* y 9))) :

int add4(int q) {return q+4;}
#define SOME_MACROFUNC(x,y) add4(x)+y*9

So far I've managed to iterate through all the macros via the 'Preprocessor' class's macro_begin() and macro_end() functions.

Then from that I've gotten the macro names, and from the 'MacroInfo' class I've been able to get whether the macro is functionlike (including param names) or not. I've also got access to the tokens in the macro, but I am only able to get the token kind e.g: string_literal, identifier, comma, l_paren, r_paren, etc.

So two things:

  1. How do I access the actual value of the tokens, rather than just their kinds.

  2. Is there a way to generate an ast from the macros given their tokens? One way I thought would be to parse my source code, then extract the macros, and using their names, add code including those macros to my source and reparse it to get the ast.

e.g. Something like:

char *tempSOME_CONSTANT = SOME_CONSTANT;    
void tempSOME_MACROFUNC(char *x, char *y) {SOME_MACROFUNC(x,y);}

Though this method seems really hacky, and probably would have trouble with macros that aren't statement or expression like.

Thanks.

edit: To clarify I mainly want the expanded body (until no macros are left, only non macro tokens) of each macro.

edit2 Solved some what:

If anyones interested I intend to expand the body of the macro manually.

"preprocessor.getSpelling(token)" to get the token value.

"preprocessor.getIdentifierTable().get(StringRef(spelling))" to get identinfo for the token.

And using "clang\lib\Lex\PPMacroExpansion.cpp" as a reference.

Still thinking about how to pass it to the parser without reparsing the whole source tree, but that shouldn't be too difficult to figure out.

Thanks to Ira Baxter for the discussion, it helped me iron out the problem.

like image 587
joesmoe891 Avatar asked Jun 06 '12 19:06

joesmoe891


1 Answers

I am working on something very similar. I use clang front end for collecting the context (w.r.t. class, function etc.) in which a macro is defined and then use a (pseudo) expression parser to figure out if the macro-body is a valid expression or not. The ultimate goal is to transform the macro into C++ declaration. Recently we got a paper accepted into ICSM-2012 that explains how we achieve this.

The tools -the demacrofier- used to get rid of macros is hosted here

Ira Baxter's examples are very insightful in the way macros are used. However, the %age of those macros are very less \ref(An Empirical analysis of C Preprocessor use by Ernst et al.). Currently, I am focusing more on common cases.

like image 90
A. K. Avatar answered Oct 13 '22 01:10

A. K.