Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python tracing and conditional jumps

I'm writing a concolic engine for Python using the sys.settrace() functionality.

The main task during such kind of execution is to record the constraints on the input variables. The constraints are nothing else than the conditions of the if statements, that create two branches (the 'then' and the 'else' branch).

When an execution is complete, the engine chooses a constraint and finds appropriate values for the inputs so that the execution will go the down along the other branch (at execution x it goes the 'then' branch, at execution x+1 it goes along the 'else' branch).

This is to have a bit of context on why I doing what I'm trying to do...

By combining settrace() and the dis module, I get to see the bytecode of each source line, just before it is executed. This way I can easily record the if conditions as they appear during execution.

But then I have the big problem. I need to know which way the if went, which branch the execution took. So if my code is something like:

if x > a:
  print x
else:
  print a

at a certain point my tracing thing will see:

t: if x > 0: 

then the python interpreter will execute the if and jump (or not) somewhere. And I will see:

t + 1: print x

So is the instruction t + 1 in the "then" branch or in the "else" one? Keep in mind the trace function sees only some bytecode in the current block.

I know of two way to do this. One is to evaluate the condition to see exactly whether it is true or false. This works only if there are no side-effects.

The other way is to try to look and the instruction pointer at t + 1 and try to understand where we are in the code. This is the way I am using right now, but it very delicate because at t + 1 I could find myself somewhere completely different (another module, a builtin function, etc).

So finally, the question I have is this: is there a way to get from Python itself, or from a C module/extension/whatever, the result of the last conditional jump?

In alternative, are there more fine-grained tracing options? Something like execute bytecode one opcode at a time. With the settrace() functionality the maximum resolution I get is whole source code lines.

In the worst case, I think I can modify the Python interpreter to expose such information, but I would leave that as last resort, for obvious reasons.

like image 748
Daniele Venzano Avatar asked Aug 04 '11 15:08

Daniele Venzano


People also ask

What is tracing in Python?

The trace module allows you to trace program execution, generate annotated statement coverage listings, print caller/callee relationships and list functions executed during a program run. It can be used in another program or from the command line.

How do you jump to another line in Python?

In Python, the new line character “\n” is used to create a new line. When inserted in a string all the characters after the character are added to a new line.

How do I see the code execution in Python?

Using timeit module check the execution time This would give us the execution time of any program. This module provides a simple way to find the execution time of small bits of Python code. It provides the timeit() method to do the same. The module function timeit.


2 Answers

There is no information in the trace facility about the last branch taken.

What I did to implement branch coverage measurement in coverage.py is to keep a record for each stack frame of the last line executed, then the next time the trace function is called, I can record a pair of line numbers which form a from-to arc of execution.

About finer-grained tracing: you can trick the Python interpreter into giving you byte code information. My experiment in this is described here: Wicked hack: Python bytecode tracing

I'd be very interested to see how this work progresses!

like image 173
Ned Batchelder Avatar answered Sep 28 '22 10:09

Ned Batchelder


In the end this is what I did. I implemented the AST instrumentation and it works pretty well.

By playing with the AST, you need to move all function calls (also attributes and subscriptions, due to getattr() and friends, out of the if conditions by creating temporary variables. Also you need to split the and and or operators.

Then add a call to your own function at the beginning of each branch, with a boolean parameter, True for the then branch and False for the else branch.

After that I wrote an AST to source converter (there is one somewhere on the net, but does not work on current Python versions).

Working with the AST is very easy and quite simple, I ended up doing three transformation passes, adding also some import statements.

This is the first pass, as an example. It splits if conditions if they contain or or and operators:

class SplitBoolOpPass1(ast.NodeTransformer):
  def visit_If(self, node):
      while isinstance(node.test, ast.BoolOp):
        new_node = ast.If(test=node.test.values.pop(), body=node.body, orelse=node.orelse)
        if isinstance(node.test.op, ast.And):
          if len(node.test.values) == 1:
            node.test = node.test.values[0]
          node.body = [new_node]
        else:
          if len(node.test.values) == 1:
            node.test = node.test.values[0]
          node.orelse = [new_node]
      node = self.generic_visit(node) # recusion
      return node

Probably it is not very useful for code coverage applications because it messes up with the code quite a lot.

like image 25
Daniele Venzano Avatar answered Sep 28 '22 09:09

Daniele Venzano