I'm writing a concolic engine for Python using the sys.settrace()
functionality.
The main task during such kind of execution is to record the constraints on the input variables. The constraints are nothing else than the conditions of the if statements, that create two branches (the 'then' and the 'else' branch).
When an execution is complete, the engine chooses a constraint and finds appropriate values for the inputs so that the execution will go the down along the other branch (at execution x it goes the 'then' branch, at execution x+1 it goes along the 'else' branch).
This is to have a bit of context on why I doing what I'm trying to do...
By combining settrace()
and the dis
module, I get to see the bytecode of each source line, just before it is executed. This way I can easily record the if conditions as they appear during execution.
But then I have the big problem. I need to know which way the if went, which branch the execution took. So if my code is something like:
if x > a:
print x
else:
print a
at a certain point my tracing thing will see:
t: if x > 0:
then the python interpreter will execute the if and jump (or not) somewhere. And I will see:
t + 1: print x
So is the instruction t + 1
in the "then" branch or in the "else" one? Keep in mind the trace function sees only some bytecode in the current block.
I know of two way to do this. One is to evaluate the condition to see exactly whether it is true or false. This works only if there are no side-effects.
The other way is to try to look and the instruction pointer at t + 1
and try to understand where we are in the code. This is the way I am using right now, but it very delicate because at t + 1
I could find myself somewhere completely different (another module, a builtin function, etc).
So finally, the question I have is this: is there a way to get from Python itself, or from a C module/extension/whatever, the result of the last conditional jump?
In alternative, are there more fine-grained tracing options? Something like execute bytecode one opcode at a time. With the settrace()
functionality the maximum resolution I get is whole source code lines.
In the worst case, I think I can modify the Python interpreter to expose such information, but I would leave that as last resort, for obvious reasons.
The trace module allows you to trace program execution, generate annotated statement coverage listings, print caller/callee relationships and list functions executed during a program run. It can be used in another program or from the command line.
In Python, the new line character “\n” is used to create a new line. When inserted in a string all the characters after the character are added to a new line.
Using timeit module check the execution time This would give us the execution time of any program. This module provides a simple way to find the execution time of small bits of Python code. It provides the timeit() method to do the same. The module function timeit.
There is no information in the trace facility about the last branch taken.
What I did to implement branch coverage measurement in coverage.py is to keep a record for each stack frame of the last line executed, then the next time the trace function is called, I can record a pair of line numbers which form a from-to arc of execution.
About finer-grained tracing: you can trick the Python interpreter into giving you byte code information. My experiment in this is described here: Wicked hack: Python bytecode tracing
I'd be very interested to see how this work progresses!
In the end this is what I did. I implemented the AST instrumentation and it works pretty well.
By playing with the AST, you need to move all function calls (also attributes and subscriptions, due to getattr()
and friends, out of the if conditions by creating temporary variables. Also you need to split the and
and or
operators.
Then add a call to your own function at the beginning of each branch, with a boolean parameter, True
for the then branch and False
for the else branch.
After that I wrote an AST to source converter (there is one somewhere on the net, but does not work on current Python versions).
Working with the AST is very easy and quite simple, I ended up doing three transformation passes, adding also some import
statements.
This is the first pass, as an example. It splits if conditions if they contain or
or and
operators:
class SplitBoolOpPass1(ast.NodeTransformer):
def visit_If(self, node):
while isinstance(node.test, ast.BoolOp):
new_node = ast.If(test=node.test.values.pop(), body=node.body, orelse=node.orelse)
if isinstance(node.test.op, ast.And):
if len(node.test.values) == 1:
node.test = node.test.values[0]
node.body = [new_node]
else:
if len(node.test.values) == 1:
node.test = node.test.values[0]
node.orelse = [new_node]
node = self.generic_visit(node) # recusion
return node
Probably it is not very useful for code coverage applications because it messes up with the code quite a lot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With