Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the source behind clang's AST

Tags:

c++

clang

Given an AST object in clang, how can I get the code behind it? I tried editing the code in the tutorial, and added:

clang::SourceLocation _b = d->getLocStart(), _e = d->getLocEnd();
char *b = sourceManager->getCharacterData(_b),
      e = sourceManager->getCharacterData(_E);
llvm:errs() << std::string(b, e-b) << "\n";

but alas, it didn't print the whole typedef declaration, only about half of it! The same phenomena happened when printing Expr.

How can I print and see the whole original string constituting the declaration?

like image 654
mikebloch Avatar asked Jun 18 '12 12:06

mikebloch


People also ask

What is AST dump?

The ast. dump() method returns a formatted string of the tree structure in a tree. The visit method available to the visitor object visits all the nodes in the tree structure.

What is AST in LLVM?

2.2. The Abstract Syntax Tree (AST) The AST for a program captures its behavior in such a way that it is easy for later stages of the compiler (e.g. code generation) to interpret. We basically want one object for each construct in the language, and the AST should closely model the language.

What is C++ AST?

An AST is essentially the same thing, a tree-like diagram of the meaningful content of a program. The part of the compiler responsible for producing the AST is called the front-end. That's where all the grammatical rules of C++ are interpreted and applied to the incoming source code.


4 Answers

Use the Lexer module:

clang::SourceManager *sm;
clang::LangOptions lopt;

std::string decl2str(clang::Decl *d) {
    clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
    clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt));
    return std::string(sm->getCharacterData(b),
        sm->getCharacterData(e)-sm->getCharacterData(b));
}
like image 172
Elazar Leibovich Avatar answered Oct 22 '22 00:10

Elazar Leibovich


The following code works for me.

std::string decl2str(clang::Decl *d, SourceManager &sm) {
    // (T, U) => "T,,"
    string text = Lexer::getSourceText(CharSourceRange::getTokenRange(d->getSourceRange()), sm, LangOptions(), 0);
    if (text.size() > 0 && (text.at(text.size()-1) == ',')) //the text can be ""
        return Lexer::getSourceText(CharSourceRange::getCharRange(d->getSourceRange()), sm, LangOptions(), 0);
    return text;
}
like image 43
LucasWang Avatar answered Oct 21 '22 22:10

LucasWang


As pointed out by answers' comments, all the other answers seem to have their flaws, so I'll post my own code that seems to cover all the flaws mentioned in comments.

I believe that getSourceRange() considers the statement as a sequence of tokens, rather than a sequence of characters. This means that, if we have a clang::Stmt that correpsonds to FOO + BAR, then the token FOO is at character 1, the token + at character 5, and the token BAR at character 7. getSourceRange() thus returns a SourceRange that essentially means "This code begins with the token at 1 and ends with the token at 7". So we have to use clang::Lexer::getLocForEndOfToken(stmt.getSourceRange().getEnd()) to get the actual, character-wise, location of the end character of the BAR token, and pass that as the end location to clang::Lexer::getSourceText. If we don't, then clang::Lexer::getSourceText would return "FOO + ", rather than "FOO + BAR" as we probably wanted.

I don't think my implementation has the problem @Steven Lu mentioned in the comments because this code uses the clang::Lexer::getSourceText function, which, according to Clang's source documentation, is designed specifically to obtain the source text from a range.

This implementation also takes @Ramin Halavati's remarks into account ; I've tested it on some code, and it indeed returned the macro-expanded string.

Here is my implementation :

/**
 * Gets the portion of the code that corresponds to given SourceRange, including the
 * last token. Returns expanded macros.
 * 
 * @see get_source_text_raw()
 */
std::string get_source_text(clang::SourceRange range, const clang::SourceManager& sm) {
    clang::LangOptions lo;

    // NOTE: sm.getSpellingLoc() used in case the range corresponds to a macro/preprocessed source.
    auto start_loc = sm.getSpellingLoc(range.getBegin());
    auto last_token_loc = sm.getSpellingLoc(range.getEnd());
    auto end_loc = clang::Lexer::getLocForEndOfToken(last_token_loc, 0, sm, lo);
    auto printable_range = clang::SourceRange{start_loc, end_loc};
    return get_source_text_raw(printable_range, sm);
}

/**
 * Gets the portion of the code that corresponds to given SourceRange exactly as
 * the range is given.
 *
 * @warning The end location of the SourceRange returned by some Clang functions 
 * (such as clang::Expr::getSourceRange) might actually point to the first character
 * (the "location") of the last token of the expression, rather than the character
 * past-the-end of the expression like clang::Lexer::getSourceText expects.
 * get_source_text_raw() does not take this into account. Use get_source_text()
 * instead if you want to get the source text including the last token.
 *
 * @warning This function does not obtain the source of a macro/preprocessor expansion.
 * Use get_source_text() for that.
 */
std::string get_source_text_raw(clang::SourceRange range, const clang::SourceManager& sm) {
    return clang::Lexer::getSourceText(clang::CharSourceRange::getCharRange(range), sm, clang::LangOptions());
}
like image 29
AnthonyD973 Avatar answered Oct 21 '22 23:10

AnthonyD973


Elazar's method worked for me except when a macro was involved. The following correction resolved it:

std::string decl2str(clang::Decl *d) {
    clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
    if (b.isMacroID())
        b = sm->getSpellingLoc(b);
    if (e.isMacroID())
        e = sm->getSpellingLoc(e);
    clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt));
    return std::string(sm->getCharacterData(b),
        sm->getCharacterData(e)-sm->getCharacterData(b));
}
like image 41
Ramin Halavati Avatar answered Oct 22 '22 00:10

Ramin Halavati