Given an AST object in clang, how can I get the code behind it? I tried editing the code in the tutorial, and added: <pre class="prettyprint"><code>clang::SourceLocation _b = d->getLocStart(), _e = d->getLocEnd(); char *b = sourceManager->getCharacterData(_b), e = sourceManager->getCharacterData(_E); llvm:errs() << std::string(b, e-b) << "\n"; </code></pre> but alas, it didn't print the whole typedef declaration, only about half of it! The same phenomena happened when printing <code>Expr</code>. How can I print and see the whole original string constituting the declaration?

Use the <code>Lexer</code> module: <pre class="prettyprint"><code>clang::SourceManager *sm; clang::LangOptions lopt; std::string decl2str(clang::Decl *d) { clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd()); clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt)); return std::string(sm->getCharacterData(b), sm->getCharacterData(e)-sm->getCharacterData(b)); } </code></pre>

The following code works for me. <pre class="prettyprint"><code>std::string decl2str(clang::Decl *d, SourceManager &sm) { // (T, U) => "T,," string text = Lexer::getSourceText(CharSourceRange::getTokenRange(d->getSourceRange()), sm, LangOptions(), 0); if (text.size() > 0 && (text.at(text.size()-1) == ',')) //the text can be "" return Lexer::getSourceText(CharSourceRange::getCharRange(d->getSourceRange()), sm, LangOptions(), 0); return text; } </code></pre>

As pointed out by answers' comments, all the other answers seem to have their flaws, so I'll post my own code that seems to cover all the flaws mentioned in comments. I believe that <code>getSourceRange()</code> considers the statement as a sequence of tokens, rather than a sequence of characters. This means that, if we have a <code>clang::Stmt</code> that correpsonds to <code>FOO + BAR</code>, then the token <code>FOO</code> is at character 1, the token <code>+</code> at character 5, and the token <code>BAR</code> at character 7. <code>getSourceRange()</code> thus returns a <code>SourceRange</code> that essentially means "This code begins with the token at 1 and ends with the token at 7". So we have to use <code>clang::Lexer::getLocForEndOfToken(stmt.getSourceRange().getEnd())</code> to get the actual, character-wise, location of the end character of the <code>BAR</code> token, and pass that as the end location to <code>clang::Lexer::getSourceText</code>. If we don't, then <code>clang::Lexer::getSourceText</code> would return <code>"FOO + "</code>, rather than <code>"FOO + BAR"</code> as we probably wanted. I don't think my implementation has the problem @Steven Lu mentioned in the comments because this code uses the <code>clang::Lexer::getSourceText</code> function, which, according to Clang's source documentation, is designed specifically to obtain the source text from a range. This implementation also takes @Ramin Halavati's remarks into account ; I've tested it on some code, and it indeed returned the macro-expanded string. Here is my implementation : <pre class="prettyprint"><code>/** * Gets the portion of the code that corresponds to given SourceRange, including the * last token. Returns expanded macros. * * @see get_source_text_raw() */ std::string get_source_text(clang::SourceRange range, const clang::SourceManager& sm) { clang::LangOptions lo; // NOTE: sm.getSpellingLoc() used in case the range corresponds to a macro/preprocessed source. auto start_loc = sm.getSpellingLoc(range.getBegin()); auto last_token_loc = sm.getSpellingLoc(range.getEnd()); auto end_loc = clang::Lexer::getLocForEndOfToken(last_token_loc, 0, sm, lo); auto printable_range = clang::SourceRange{start_loc, end_loc}; return get_source_text_raw(printable_range, sm); } /** * Gets the portion of the code that corresponds to given SourceRange exactly as * the range is given. * * @warning The end location of the SourceRange returned by some Clang functions * (such as clang::Expr::getSourceRange) might actually point to the first character * (the "location") of the last token of the expression, rather than the character * past-the-end of the expression like clang::Lexer::getSourceText expects. * get_source_text_raw() does not take this into account. Use get_source_text() * instead if you want to get the source text including the last token. * * @warning This function does not obtain the source of a macro/preprocessor expansion. * Use get_source_text() for that. */ std::string get_source_text_raw(clang::SourceRange range, const clang::SourceManager& sm) { return clang::Lexer::getSourceText(clang::CharSourceRange::getCharRange(range), sm, clang::LangOptions()); } </code></pre>

Elazar's method worked for me except when a macro was involved. The following correction resolved it: <pre class="prettyprint"><code>std::string decl2str(clang::Decl *d) { clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd()); if (b.isMacroID()) b = sm->getSpellingLoc(b); if (e.isMacroID()) e = sm->getSpellingLoc(e); clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt)); return std::string(sm->getCharacterData(b), sm->getCharacterData(e)-sm->getCharacterData(b)); } </code></pre>

Getting the source behind clang's AST

Tags:

c++

clang

Given an AST object in clang, how can I get the code behind it? I tried editing the code in the tutorial, and added:

clang::SourceLocation _b = d->getLocStart(), _e = d->getLocEnd();
char *b = sourceManager->getCharacterData(_b),
      e = sourceManager->getCharacterData(_E);
llvm:errs() << std::string(b, e-b) << "\n";

but alas, it didn't print the whole typedef declaration, only about half of it! The same phenomena happened when printing Expr.

How can I print and see the whole original string constituting the declaration?

654

asked Jun 18 '12 12:06

mikebloch

4 Answers

Use the Lexer module:

clang::SourceManager *sm;
clang::LangOptions lopt;

std::string decl2str(clang::Decl *d) {
    clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
    clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt));
    return std::string(sm->getCharacterData(b),
        sm->getCharacterData(e)-sm->getCharacterData(b));
}

172

answered Oct 22 '22 00:10

Elazar Leibovich

The following code works for me.

std::string decl2str(clang::Decl *d, SourceManager &sm) {
    // (T, U) => "T,,"
    string text = Lexer::getSourceText(CharSourceRange::getTokenRange(d->getSourceRange()), sm, LangOptions(), 0);
    if (text.size() > 0 && (text.at(text.size()-1) == ',')) //the text can be ""
        return Lexer::getSourceText(CharSourceRange::getCharRange(d->getSourceRange()), sm, LangOptions(), 0);
    return text;
}

answered Oct 21 '22 22:10

LucasWang

As pointed out by answers' comments, all the other answers seem to have their flaws, so I'll post my own code that seems to cover all the flaws mentioned in comments.

I believe that getSourceRange() considers the statement as a sequence of tokens, rather than a sequence of characters. This means that, if we have a clang::Stmt that correpsonds to FOO + BAR, then the token FOO is at character 1, the token + at character 5, and the token BAR at character 7. getSourceRange() thus returns a SourceRange that essentially means "This code begins with the token at 1 and ends with the token at 7". So we have to use clang::Lexer::getLocForEndOfToken(stmt.getSourceRange().getEnd()) to get the actual, character-wise, location of the end character of the BAR token, and pass that as the end location to clang::Lexer::getSourceText. If we don't, then clang::Lexer::getSourceText would return "FOO + ", rather than "FOO + BAR" as we probably wanted.

I don't think my implementation has the problem @Steven Lu mentioned in the comments because this code uses the clang::Lexer::getSourceText function, which, according to Clang's source documentation, is designed specifically to obtain the source text from a range.

This implementation also takes @Ramin Halavati's remarks into account ; I've tested it on some code, and it indeed returned the macro-expanded string.

Here is my implementation :

/**
 * Gets the portion of the code that corresponds to given SourceRange, including the
 * last token. Returns expanded macros.
 * 
 * @see get_source_text_raw()
 */
std::string get_source_text(clang::SourceRange range, const clang::SourceManager& sm) {
    clang::LangOptions lo;

    // NOTE: sm.getSpellingLoc() used in case the range corresponds to a macro/preprocessed source.
    auto start_loc = sm.getSpellingLoc(range.getBegin());
    auto last_token_loc = sm.getSpellingLoc(range.getEnd());
    auto end_loc = clang::Lexer::getLocForEndOfToken(last_token_loc, 0, sm, lo);
    auto printable_range = clang::SourceRange{start_loc, end_loc};
    return get_source_text_raw(printable_range, sm);
}

/**
 * Gets the portion of the code that corresponds to given SourceRange exactly as
 * the range is given.
 *
 * @warning The end location of the SourceRange returned by some Clang functions 
 * (such as clang::Expr::getSourceRange) might actually point to the first character
 * (the "location") of the last token of the expression, rather than the character
 * past-the-end of the expression like clang::Lexer::getSourceText expects.
 * get_source_text_raw() does not take this into account. Use get_source_text()
 * instead if you want to get the source text including the last token.
 *
 * @warning This function does not obtain the source of a macro/preprocessor expansion.
 * Use get_source_text() for that.
 */
std::string get_source_text_raw(clang::SourceRange range, const clang::SourceManager& sm) {
    return clang::Lexer::getSourceText(clang::CharSourceRange::getCharRange(range), sm, clang::LangOptions());
}

answered Oct 21 '22 23:10

AnthonyD973

Elazar's method worked for me except when a macro was involved. The following correction resolved it:

std::string decl2str(clang::Decl *d) {
    clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
    if (b.isMacroID())
        b = sm->getSpellingLoc(b);
    if (e.isMacroID())
        e = sm->getSpellingLoc(e);
    clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt));
    return std::string(sm->getCharacterData(b),
        sm->getCharacterData(e)-sm->getCharacterData(b));
}

answered Oct 22 '22 00:10

Ramin Halavati

Related questions
                            
                                Creating N nested for-loops
                            
                                Multithreading performance of QtConcurrent Vs QThread with many threads
                            
                                C++, does bool conversion always fall back to implicit conversion to void*?
                            
                                Is there a recommended way to test if a smart pointer is null?
                            
                                Guaranteed elision and chained function calls
                            
                                Difference between return {} and return Object{}
                            
                                Unable to create unordered_set with lambda function
                            
                                What's the behavior of an uninitialized variable used as its own initializer?
                            
                                C++ debug/print custom type with GDB : the case of nlohmann json library
                            
                                In C++ what does it mean for a compiler to "inline" a function object?
                            
                                Rotate a 2D array in-place without using a new array - best C++ solution?
                            
                                popen equivalent in c++
                            
                                C++ cout hex format
                            
                                Virtual Qt signal?
                            
                                Why does the Visual Studio conversion wizard 2010 create a massive SDF database file?
                            
                                There's a strlen, and a wcslen, but is there a template function like strlen<char> or strlen<wchar_t>?
                            
                                STL map insertion efficiency: [] vs. insert
                            
                                Is an unnamed parameter actually passed during a function call?
                            
                                Deriving different and incomparable types from int in C++
                            
                                return reference of an object from an iterator

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With