Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtain original (unexpanded) macro text using libclang

Tags:

clang

libclang

Using libclang, I have a cursor into a AST, which corresponds to the statement resulting from a macro expansion. I want to retrieve the original, unexpanded macro text.

I've looked for a libclang API to do this, and can't find one. Am I missing something?

Assuming such an API doesn't exist, I see a couple of ways to go about doing this, both based on using clang_getCursorExtent() to obtain the source range of the cursor - which is, presumably, the range of the original text.

The first idea is to use clang_getFileLocation() to obtain the filename and position od the range start and end, and to read the text directly from the file. If I've compiled from unsaved files then i need to deal with that, but my main concern with this approach is that it just doesn't seem right to be going outside to the filesystem when I'm sure clang holds all this information internally. There also would be implications if the AST has been loaded rather than generated, or if the source files have been modified since they were parsed.

The second approach is to call clang_tokenize() on the cursor extent. I tried doing this, and found that it fails to produce a token list for most of the cursors in the AST. Tracing into the code, it turns out that internally clang_tokenize() manipulates the supplied range and ends up concluding that it spans multiple files (presumably due to some effect of the macro expansion), and aborts. This seems incorrect to me, but I do feel that in any case I'm abusing clang_tokenize() trying to do this.

So, what's the best approach?

like image 582
Jeremy Avatar asked May 28 '13 07:05

Jeremy


1 Answers

This is the only way I've found.

So you get the top level cursor with clang_getTranslationUnitCursor(). Then, you do clang_visitChildren(), with the visitor function passed into this returning CXChildVisit_Continue so that only the immediate children are returned. Among the children, you see the usual cursor types for top level declarations (like CXCursor_TypedefDecl, CXCursor_EnumDecl) but among them there's also CXCursor_MacroExpansion. Every single macro expansion appears to show up in a cursor with this type. You can then call clang_tokenize() on any of these cursors and it gives you the unexpanded macro text.

I have no idea why macro expansions get stuck near the top of the AST instead of within elements where they get used, it makes things pretty awkward. Example:

enum someEnum{
    one = SOMEMACRO,
    two,
    three
}

It'd be nice if the macro expansion cursor for SOMEMACRO were within the enum declaration instead of being a sibling to it.

(I realize this is ridiculously late but I'm hoping this gets libclang more exposure, maybe someone more experienced with it can provide more insight).

like image 156
sciencectn Avatar answered Nov 06 '22 00:11

sciencectn