I know that there are ways to automatically generate a CFG (C ontrol F low G raph) from source code. However, from what I understand, those methods give me a visual graph - an image. I can't really use such an image to do any computation with.
Thus my question: is there a way to automatically generate a CFG from source code such that the source code is returned to me in some data structure or file that is programmatically parseable? (ideally, I'd like to have access to the line numbers for each node/edge in the CFG as well)
I will be using this for a project that uses such a CFG to extract control flow paths to determine input path coverage (which I will solve using trace)
Important: The code that I am trying to analyze is written in python; and I want to perform the analysis with python
Q: Is it possible to get control flow graph of the application with the llvm infrastructure in terms of basic blocks?
A:Yes, given a BasicBlock*, you can iterate over the pred/succ blocks in the CFG like this:
#include "llvm/Support/CFG.h"
for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
BasicBlock *OnePredecessor = *PI;
...
For successors, s/pred/succ/
Q:Is there any way of viewing the Control flow graph of the application.
A:Yes, use: analyze -print-cfg X.(bc|ll) or: analyze -print-cfg-only X.(bc|ll)
These will emit "dot" files that can be turned into a variety of graphics formats with the graphviz toolset that you can get on the net.
http://lists.cs.uiuc.edu/pipermail/llvmdev/2005-June/004416.html That shows you how to programatically extract CFG AND using an auxiliary tool to print CFG to console.
Have you looked into python's ast
module. It won't do exactly what you're looking for, but you could probably build your own with greater ease using it.
Edit To answer the question in the comment:
If you save the following in test.py
def add(a, b):
return a + b
def mult(a, b):
result = 0
for i in range(b):
result += add(result, a)
return result
Then do
import _ast
test = compile(open('test.py').read(), 'test.py', 'exec', _ast.PyCF_ONLY_AST)
Then you have test which is a Module class and which has a body attribute. This body attribute is a list of everything in the module, so in this example:
test.body # >>> [<_ast.FunctionDef object at 0x...>, <_ast.FunctionDef object at 0x...>]
test.body[0].name # >>> 'add'
(test.body[0].lineno, test.body[0].col_offset) # >>> 1, 0
test.body[0].body # >>> [<_ast.Return object at 0x...>]
test.body[1].body # >>> [<_ast.Assign ...>, <_ast.For ...>, <_ast.Return ...>]
# etc.
You can find which classes you want to track and note that each statement has a different class associated with it.
It probably won't be as easy as I first thought but it shouldn't be horrible. Also, to do it iteratively you'll need the ast
module which has the function iter_child_nodes
, e.g.,
test2 = next(ast.iter_child_nodes(test))
test2 is test.body[0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With