Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute ingestible control flow graph from source code

I know that there are ways to automatically generate a CFG (C ontrol F low G raph) from source code. However, from what I understand, those methods give me a visual graph - an image. I can't really use such an image to do any computation with.

Thus my question: is there a way to automatically generate a CFG from source code such that the source code is returned to me in some data structure or file that is programmatically parseable? (ideally, I'd like to have access to the line numbers for each node/edge in the CFG as well)

I will be using this for a project that uses such a CFG to extract control flow paths to determine input path coverage (which I will solve using trace)

Important: The code that I am trying to analyze is written in python; and I want to perform the analysis with python

like image 815
inspectorG4dget Avatar asked Jan 09 '13 01:01

inspectorG4dget


2 Answers

Q: Is it possible to get control flow graph of the application with the llvm infrastructure in terms of basic blocks?

A:Yes, given a BasicBlock*, you can iterate over the pred/succ blocks in the CFG like this:

#include "llvm/Support/CFG.h"

   for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
     BasicBlock *OnePredecessor = *PI;
     ...

For successors, s/pred/succ/

Q:Is there any way of viewing the Control flow graph of the application.

A:Yes, use: analyze -print-cfg X.(bc|ll) or: analyze -print-cfg-only X.(bc|ll)

These will emit "dot" files that can be turned into a variety of graphics formats with the graphviz toolset that you can get on the net.

http://lists.cs.uiuc.edu/pipermail/llvmdev/2005-June/004416.html That shows you how to programatically extract CFG AND using an auxiliary tool to print CFG to console.

like image 76
Realn0whereman Avatar answered Sep 22 '22 03:09

Realn0whereman


Have you looked into python's ast module. It won't do exactly what you're looking for, but you could probably build your own with greater ease using it.

Edit To answer the question in the comment:

If you save the following in test.py

def add(a, b):
    return a + b


def mult(a, b):
    result = 0
    for i in range(b):
        result += add(result, a)
    return result

Then do

import _ast


test = compile(open('test.py').read(), 'test.py', 'exec', _ast.PyCF_ONLY_AST)

Then you have test which is a Module class and which has a body attribute. This body attribute is a list of everything in the module, so in this example:

test.body  # >>> [<_ast.FunctionDef object at 0x...>, <_ast.FunctionDef object at 0x...>]
test.body[0].name  # >>> 'add'
(test.body[0].lineno, test.body[0].col_offset)  # >>> 1, 0
test.body[0].body  # >>> [<_ast.Return object at 0x...>]

test.body[1].body  # >>> [<_ast.Assign ...>, <_ast.For ...>, <_ast.Return ...>]
# etc.

You can find which classes you want to track and note that each statement has a different class associated with it.

It probably won't be as easy as I first thought but it shouldn't be horrible. Also, to do it iteratively you'll need the ast module which has the function iter_child_nodes, e.g.,

test2 = next(ast.iter_child_nodes(test))
test2 is test.body[0]
like image 30
Ian Stapleton Cordasco Avatar answered Sep 23 '22 03:09

Ian Stapleton Cordasco