Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all "malloc" calls using ASTMatcher

I'm trying to get all the malloc calls using ASTMatcher in clang. This is the code sample:

Finder.addMatcher(
      callExpr(
        hasParent(binaryOperator(
          hasOperatorName("=")).bind("assignment")),
          declRefExpr(to(functionDecl(hasName("malloc"))))).bind("functionCall"),
      &HandlerForFunctionCall);

It compiles fine. But I still couldn't get any malloc calls. How can I get all malloc calls using clang ASTMatcher?

like image 685
user Avatar asked Apr 01 '15 00:04

user


1 Answers

The Problem

The signature of the malloc function is defined as follows:

void* malloc (size_t size);

In order to assign the return value of malloc to a pointer of any type other than void*, you have to cast it. While C++ requires you to cast explicitly, the C compiler does that implicitly for you. So even if you write

int *a = malloc(sizeof(*a));

the compiler will implicitly cast the RHS expression. That would be equivalent to

int *a = (int*) malloc(sizeof(*a));

You are using the hasParent narrowing matcher, which only matches direct parents and not any ancestors. Therefore your matcher will only match assignments without any type of cast.

And pretty much the same thing happens with your declRefExpr. The C standard says that functions automatically decay to pointers-to-functions automatically. Clang implicitly casts malloc to void *(*)(size_t) and that breaks your hierarchy of matchers.

Possible Solutions

Well, that depends on what you would actually want to do. First of all, you can generally fix the part selecting malloc functions by using this snippet:

callExpr(callee(functionDecl(hasName("malloc"))))

The rest depends on what you want to select. If you are only interested in matching direct assignments like in the first example above, then you can use the ignoringImpCasts matcher. For some reason I have not been able to insert it in the matcher as you wrote it, so just invert the matcher. Looks like this:

binaryOperator(
  hasOperatorName("="),
  hasRHS(ignoringImpCasts(
    callExpr(
      callee(functionDecl(hasName("malloc")))
    ).bind("functionCall")
  ))
).bind("assignment")

If you additionally want to include explicit casts like in the second example, use ignoringParenImpCasts instead:

binaryOperator(
  hasOperatorName("="),
  hasRHS(ignoringParenImpCasts(
    callExpr(
      callee(functionDecl(hasName("malloc")))
    ).bind("functionCall")
  ))
).bind("assignment")

If you are interested in all assignments with arbitrary expressions containing malloc, use hasAncestor instead. It does not only match direct parents but traverses up until it matches your node:

callExpr(
  callee(functionDecl(hasName("malloc"))),
  hasAncestor(
    binaryOperator(hasOperatorName("=")).bind("assignment")
  )
).bind("functionCall")

One more thing. You are probably only interested in matching whatever is defined in your source code directly and not in included header files. Simply add unless(isExpansionInSystemHeader()) to your top-level matcher and it will exclude all definitions from system headers.

Note that this code has been tested with LLVM 3.7 and future changes might break it.

How to Debug

Allright, so how the hell should we know all that? Turns out, that Clang already provides you all you need :) Specifically, there are two features you might be interested in.

When you invoke Clang with -Xclang ast-dump -fsyntax-only it will print out a pretty and colorful AST of the translation unit. Don't be surprised to find a huge preamble with all declarations from system headers you included, as it has to run the preprocessor first to generate the AST. Example:

$ clang -Xclang -ast-dump -fsyntax-only example.c
...

`-FunctionDecl 0x3f2fc28 <line:19:1, line:31:1> line:19:5 main 'int ()'
  `-CompoundStmt 0x3f307b8 <line:20:1, line:31:1>
    |-BinaryOperator 0x3f2ff38 <line:22:3, col:29> 'int *' '='
    | |-DeclRefExpr 0x3f2fd40 <col:3> 'int *' lvalue Var 0x3f2f388 'a' 'int *'
    | `-ImplicitCastExpr 0x3f2ff20 <col:7, col:29> 'int *' <BitCast>
    |   `-CallExpr 0x3f2fef0 <col:7, col:29> 'void *'
    |     |-ImplicitCastExpr 0x3f2fed8 <col:7> 'void *(*)(unsigned long)' <FunctionToPointerDecay>
    |     | `-DeclRefExpr 0x3f2fd68 <col:7> 'void *(unsigned long)' Function 0x3f1cdd0 'malloc' 'void *(unsigned long)'
    |     `-BinaryOperator 0x3f2fe88 <col:15, col:28> 'unsigned long' '*'
    |       |-ImplicitCastExpr 0x3f2fe70 <col:15> 'unsigned long' <IntegralCast>
    |       | `-ImplicitCastExpr 0x3f2fe58 <col:15> 'int' <LValueToRValue>
    |       |   `-DeclRefExpr 0x3f2fd90 <col:15> 'int' lvalue Var 0x3f2f488 'n' 'int'
    |       `-UnaryExprOrTypeTraitExpr 0x3f2fe38 <col:19, col:28> 'unsigned long' sizeof
    |         `-ParenExpr 0x3f2fe18 <col:25, col:28> 'int' lvalue
    |           `-UnaryOperator 0x3f2fdf8 <col:26, col:27> 'int' lvalue prefix '*'
    |             `-ImplicitCastExpr 0x3f2fde0 <col:27> 'int *' <LValueToRValue>
    |               `-DeclRefExpr 0x3f2fdb8 <col:27> 'int *' lvalue Var 0x3f2f388 'a' 'int *'

...

And then there is clang-query which is built along with clang if you compile it from sources. It is an excellent example of libTooling and an absolutely amazing help in development at the same time. You simply run it on an example source file and use it to test your matchers (note that it implicitly binds "root" to the complete matcher):

$ <llvm>/bin/clang-query example.c --
clang-query> match callExpr(callee(functionDecl(hasName("malloc"))),hasAncestor(binaryOperator(hasOperatorName("=")).bind("assignment"))).bind("functionCall")

Match #1:

/vagrant/tests/true-valid-memsafety.c:22:3: note: "assignment" binds here
  a = malloc (n * sizeof(*a));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/vagrant/tests/true-valid-memsafety.c:22:7: note: "functionCall" binds here
  a = malloc (n * sizeof(*a));
      ^~~~~~~~~~~~~~~~~~~~~~~
/vagrant/tests/true-valid-memsafety.c:22:7: note: "root" binds here
  a = malloc (n * sizeof(*a));
      ^~~~~~~~~~~~~~~~~~~~~~~

Match #2:

/vagrant/tests/true-valid-memsafety.c:23:3: note: "assignment" binds here
  b = malloc (n * sizeof(*b));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/vagrant/tests/true-valid-memsafety.c:23:7: note: "functionCall" binds here
  b = malloc (n * sizeof(*b));
      ^~~~~~~~~~~~~~~~~~~~~~~
/vagrant/tests/true-valid-memsafety.c:23:7: note: "root" binds here
  b = malloc (n * sizeof(*b));
      ^~~~~~~~~~~~~~~~~~~~~~~
2 matches.

If you are interested in more information on that topic, head over to this excellent blog post by Eli Bendersky for a good overview and introduction. The complete documentation for AST matchers can be found here.

like image 72
Jan Michael Auer Avatar answered Sep 24 '22 14:09

Jan Michael Auer