Given an AST object in clang, how can I get the code behind it? I tried editing the code in the tutorial, and added:
clang::SourceLocation _b = d->getLocStart(), _e = d->getLocEnd();
char *b = sourceManager->getCharacterData(_b),
e = sourceManager->getCharacterData(_E);
llvm:errs() << std::string(b, e-b) << "\n";
but alas, it didn't print the whole typedef declaration, only about half of it! The same phenomena happened when printing Expr
.
How can I print and see the whole original string constituting the declaration?
The ast. dump() method returns a formatted string of the tree structure in a tree. The visit method available to the visitor object visits all the nodes in the tree structure.
2.2. The Abstract Syntax Tree (AST) The AST for a program captures its behavior in such a way that it is easy for later stages of the compiler (e.g. code generation) to interpret. We basically want one object for each construct in the language, and the AST should closely model the language.
An AST is essentially the same thing, a tree-like diagram of the meaningful content of a program. The part of the compiler responsible for producing the AST is called the front-end. That's where all the grammatical rules of C++ are interpreted and applied to the incoming source code.
Use the Lexer
module:
clang::SourceManager *sm;
clang::LangOptions lopt;
std::string decl2str(clang::Decl *d) {
clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt));
return std::string(sm->getCharacterData(b),
sm->getCharacterData(e)-sm->getCharacterData(b));
}
The following code works for me.
std::string decl2str(clang::Decl *d, SourceManager &sm) {
// (T, U) => "T,,"
string text = Lexer::getSourceText(CharSourceRange::getTokenRange(d->getSourceRange()), sm, LangOptions(), 0);
if (text.size() > 0 && (text.at(text.size()-1) == ',')) //the text can be ""
return Lexer::getSourceText(CharSourceRange::getCharRange(d->getSourceRange()), sm, LangOptions(), 0);
return text;
}
As pointed out by answers' comments, all the other answers seem to have their flaws, so I'll post my own code that seems to cover all the flaws mentioned in comments.
I believe that getSourceRange()
considers the statement as a sequence of tokens, rather than a sequence of characters. This means that, if we have a clang::Stmt
that correpsonds to FOO + BAR
, then the token FOO
is at character 1, the token +
at character 5, and the token BAR
at character 7. getSourceRange()
thus returns a SourceRange
that essentially means "This code begins with the token at 1 and ends with the token at 7". So we have to use clang::Lexer::getLocForEndOfToken(stmt.getSourceRange().getEnd())
to get the actual, character-wise, location of the end character of the BAR
token, and pass that as the end location to clang::Lexer::getSourceText
. If we don't, then clang::Lexer::getSourceText
would return "FOO + "
, rather than "FOO + BAR"
as we probably wanted.
I don't think my implementation has the problem @Steven Lu mentioned in the comments because this code uses the clang::Lexer::getSourceText
function, which, according to Clang's source documentation, is designed specifically to obtain the source text from a range.
This implementation also takes @Ramin Halavati's remarks into account ; I've tested it on some code, and it indeed returned the macro-expanded string.
Here is my implementation :
/**
* Gets the portion of the code that corresponds to given SourceRange, including the
* last token. Returns expanded macros.
*
* @see get_source_text_raw()
*/
std::string get_source_text(clang::SourceRange range, const clang::SourceManager& sm) {
clang::LangOptions lo;
// NOTE: sm.getSpellingLoc() used in case the range corresponds to a macro/preprocessed source.
auto start_loc = sm.getSpellingLoc(range.getBegin());
auto last_token_loc = sm.getSpellingLoc(range.getEnd());
auto end_loc = clang::Lexer::getLocForEndOfToken(last_token_loc, 0, sm, lo);
auto printable_range = clang::SourceRange{start_loc, end_loc};
return get_source_text_raw(printable_range, sm);
}
/**
* Gets the portion of the code that corresponds to given SourceRange exactly as
* the range is given.
*
* @warning The end location of the SourceRange returned by some Clang functions
* (such as clang::Expr::getSourceRange) might actually point to the first character
* (the "location") of the last token of the expression, rather than the character
* past-the-end of the expression like clang::Lexer::getSourceText expects.
* get_source_text_raw() does not take this into account. Use get_source_text()
* instead if you want to get the source text including the last token.
*
* @warning This function does not obtain the source of a macro/preprocessor expansion.
* Use get_source_text() for that.
*/
std::string get_source_text_raw(clang::SourceRange range, const clang::SourceManager& sm) {
return clang::Lexer::getSourceText(clang::CharSourceRange::getCharRange(range), sm, clang::LangOptions());
}
Elazar's method worked for me except when a macro was involved. The following correction resolved it:
std::string decl2str(clang::Decl *d) {
clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
if (b.isMacroID())
b = sm->getSpellingLoc(b);
if (e.isMacroID())
e = sm->getSpellingLoc(e);
clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt));
return std::string(sm->getCharacterData(b),
sm->getCharacterData(e)-sm->getCharacterData(b));
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With