Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Static source code analysis with LLVM

I recently discover the LLVM (low level virtual machine) project, and from what I have heard It can be used to performed static analysis on a source code. I would like to know if it is possible to extract the different function call through function pointer (find the caller function and the callee function) in a program.

I could find the kind of information in the website so it would be really helpful if you could tell me if such an library already exist in LLVM or can you point me to the good direction on how to build it myself (existing source code, reference, tutorial, example...).

EDIT:

With my analysis I actually want to extract caller/callee function call. In the case of a function pointer, I would like to return a set of possible callee. both caller and callee must be define in the source code (this does not include third party function in a library).

like image 802
Phong Avatar asked Mar 01 '10 07:03

Phong


3 Answers

I think that Clang (the analyzer that is part of LLVM) is geared towards the detection of bugs, which means that the analyzer tries to compute possible values of some expressions (to reduce false positives) but it sometimes gives up (in this case, emitting no alarm to avoid a deluge of false positives).

If your program is C only, I recommend you take a look at the Value Analysis in Frama-C. It computes supersets of possible values for any l-value at each point of the program, under some hypotheses that are explained at length here. Complexity in the analyzed program only means that the returned supersets are more approximated, but they still contain all the possible run-time values (as long as you remain within the aforementioned hypotheses).

EDIT: if you are interested in possible values of function pointers for the purpose of slicing the analyzed program, you should definitely take a look at the existing dependencies and slicing computations in Frama-C. The website doesn't have any nice example for slicing, here is one from a discussion on the mailing-list

like image 82
Pascal Cuoq Avatar answered Oct 06 '22 02:10

Pascal Cuoq


You should take a look at Elsa. It is relatively easy to extend and lets you parse an AST fairly easily. It handles all of the parsing, lexing and AST generation and then lets you traverse the tree using the Visitor pattern.

class CallGraphGenerator : public ASTVisitor
{
  //...
  virtual bool visitFunction(Function *func);
  virtual bool visitExpression(Expression *expr);
}

You can then detect function declarations, and probably detect function pointer usage. Finally you could check the function pointers' declarations and generate a list of the declared functions that could have been called using that pointer.

like image 32
Gabe Avatar answered Oct 06 '22 00:10

Gabe


In our project, we perform static source code analysis by converting LLVM bytecode into C code with help of llc program that is shipped with LLVM. Then we analyze C code with CIL (C Intermediate Language), but for C language a lot of tools is available. The pitfail that the code generated by llc is AWFUL and suffers from a great loss of precision. But still, it's one way to go.

Edit: in fact, I wouldn't recommend anyone to o like this. But still, just for a record...

like image 24
P Shved Avatar answered Oct 06 '22 01:10

P Shved