Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

clang fails replacing a statement if it contains a macro

Tags:

c++

clang

clang++

I'm using clang to try and parse (with the C++ API) some C++ files and make all the case - break pairs use a specific style.

Example:

**Original**

switch(...)
{
   case 1:
   {
      <code>
   }break;

   case 2:
   {
      <code>
      break;
   }
}

**After replacement**

switch(...)
{
   case 1:
   {
      <code>
      break;
   }

   case 2:
   {
      <code>
      break;
   }
}

What I have so far does exactly what I want if the code parts don't contain any macros. My question is: does clang treat expanded ( if I do a dump of a problematic statement it will show the expanded version ) macros differently? if so how can I get this to work?

Additional info that may help:

I'm using Rewriter::ReplaceStmt to replace the sub-statements of each case with a newly created CompoundStmt and I noticed ReplaceStmt returns true if the "from" parameter contains a macro and the only way that method returns true is if

Rewriter::getRangeSize(from->getSourceRange())

returns -1

like image 408
user1233963 Avatar asked Jun 05 '14 14:06

user1233963


2 Answers

Your problem is caused by the design of SourceLocation.

An article follows:


Macro expansion with clang's SourceLocation

SourceLocation is designed to be flexible enough to handle both unexpanded locations and macro expanded locations at the same time.

If the token is the result of an expansion, then there are two different locations to be kept into account: the spelling location (the location of the characters corresponding to the token) and the instantiation location (the location where the token was used - the macro instantiation point).

Let's take the following simple source file as an example:

#define MACROTEST bool

int main() {

    int var = 2;
    switch(var)
    {
       case 1:
       {
          MACROTEST newvar;
       }break;

       case 2:
       {
          MACROTEST newvar;
          break;
       }
    }

    return 0;
}

and suppose we want to replace the two declarations statements

MACROTEST newvar;

with the declaration statement

int var = 2;

in order to get something like this

#define MACROTEST bool

int main() {

    int var = 2;
    switch(var)
    {
       case 1:
       {
          int var = 2;
       }break;

       case 2:
       {
          int var = 2;
          break;
       }
    }

    return 0;
}

if we output the AST (-ast-dump) we get the following (I'm including an image since it's more intuitive than just uncolored text):

clang AST

as you can see the location reported for the first DeclStmt we're interested in, spans from line 1 to 10: that means clang is reporting in the dump the interval spanning from the macro's line to the point where the macro is used:

#define MACROTEST [from_here]bool

int main() {

    int var = 2;
    switch(var)
    {
       case 1:
       {
          MACROTEST newvar[to_here];
       }break;

       case 2:
       {
          MACROTEST newvar;
          break;
       }
    }

    return 0;
}

(notice that the count of characters might not be the same with normal spaces since my text editor used tabs)

Ultimately, this is triggering the Rewriter::getRangeSize failure (-1) and the subsequent Rewriter::ReplaceStmt true return value (which means failure - see documentation).

What is happening is the following: you're receiving a couple of SourceLocation markers where the first is a macro ID (isMacroID() would return true) while the latter isn't.

In order to successfully get the extent of the macro-expanded statement we need to take a step back and communicate with the SourceManager which is the query-gateway for all your spelling locations and instantiation locations (take a step back if you don't remember these terms) needs. I can't be more clear than the detailed description provided in the documentation:

The SourceManager can be queried for information about SourceLocation objects, turning them into either spelling or expansion locations. Spelling locations represent where the bytes corresponding to a token came from and expansion locations represent where the location is in the user's view. In the case of a macro expansion, for example, the spelling location indicates where the expanded token came from and the expansion location specifies where it was expanded.

At this point you should be getting why I explained all this stuff in the first place: if you intend to use source ranges for your substitution, you need to use the appropriate expansion interval.

Back to the sample I proposed, this is the code to achieve it:

SourceLocation startLoc = declaration_statement->getLocStart();
SourceLocation endLoc = declaration_statement->getLocEnd();

if( startLoc.isMacroID() ) {
    // Get the start/end expansion locations
    std::pair< SourceLocation, SourceLocation > expansionRange = 
             rewriter.getSourceMgr().getImmediateExpansionRange( startLoc );

    // We're just interested in the start location
    startLoc = expansionRange.first;
}

if( endLoc.isMacroID() ) {
  // will not be executed
}

SourceRange expandedLoc( startLoc, endLoc );
bool failure = rewriter.ReplaceText( expandedLoc, 
                                     replacer_statement->getSourceRange() );

if( !failure )
    std::cout << "This will get printed if you did it correctly!";

The declaration_statement is either one of the two

MACROTEST newvar;

while replacer_statement is the statement used for the replacement

int var = 2;

The above code will get you this:

#define MACROTEST bool

int main() {

    int var = 2;
    switch(var)
    {
       case 1:
       {
          int var = 2;
       }break;

       case 2:
       {
          int var = 2;
          break;
       }
    }

    return 0;
}

i.e. a complete and successful substitution of the macro-expanded statement.


References:

  • clang documentation
  • clang doxygen API
  • clang source code
like image 142
Marco A. Avatar answered Oct 30 '22 05:10

Marco A.


In oder to get the file location related to the macro expansion, an API function can be used to retrieve the information:

SourceLocation startLoc = rewriter.getSourceMgr().getFileLoc(
    declaration_statement->getLocStart());
SourceLocation endLoc = rewriter.getSourceMgr().getFileLoc(
    declaration_statement->getLocEnd());

This API function does the same as what Marco wrote in his code, but automatically.

If we look at the implementation of the function getFileLoc():

This is the description of the function: Given Loc, if it is a macro location return the expansion location or the spelling location, depending on if it comes from a macro argument or not.

 SourceLocation getFileLoc(SourceLocation Loc) const {
     if (Loc.isFileID()) return Loc;
         return getFileLocSlowCase(Loc);
 }

SourceLocation SourceManager::getFileLocSlowCase(SourceLocation Loc) const {
    do {
        if (isMacroArgExpansion(Loc))
            Loc = getImmediateSpellingLoc(Loc);
        else
            Loc = getImmediateExpansionRange(Loc).first;
    } while (!Loc.isFileID());
    return Loc;
}
like image 38
kenny-liu Avatar answered Oct 30 '22 03:10

kenny-liu