Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging parallel Python programs (mpi4py)

I have an mpi4py program that hangs intermittently. How can I trace what the individual processes are doing?

I can run the program in different terminals, for example using pdb

mpiexec -n 4 xterm -e "python -m pdb my_program.py"

But this gets cumbersome if the issue only manifests with a large number of processes (~80 in my case). In addition, it's easy to catch exceptions with pdb but I'd need to see the trace to figure out where the hang occurs.

like image 352
teekarna Avatar asked Oct 20 '17 19:10

teekarna


People also ask

How do you debug a Python program?

Starting Python Debugger To start debugging within the program just insert import pdb, pdb. set_trace() commands. Run your script normally, and execution will stop where we have introduced a breakpoint. So basically we are hard coding a breakpoint on a line below where we call set_trace().

Can you run Python in debug mode?

Basic debugging# If you're only interested in debugging a Python script, the simplest way is to select the down-arrow next to the run button on the editor and select Debug Python File in Terminal.

Is Python difficult to debug?

But when it comes to Python, debugging “out of the box” is a little bit cruder and primitive; single-step debugging is the main way to debug Python code, and is quite slow and clunky. It's just easier to use print statements; Python founder Guido van Rossum (reportedly) uses them for 90 percent of his debugging.


1 Answers

The Python trace module allows you to trace program execution. In order to store the trace of each process separately, you need to wrap your code in a function:

def my_program(*args, **kwargs):
    # insert your code here
    pass

And then run it with trace.Trace.runfunc:

import sys
import trace

# define Trace object: trace line numbers at runtime, exclude some modules
tracer = trace.Trace(
    ignoredirs=[sys.prefix, sys.exec_prefix],
    ignoremods=[
        'inspect', 'contextlib', '_bootstrap',
        '_weakrefset', 'abc', 'posixpath', 'genericpath', 'textwrap'
    ],
    trace=1,
    count=0)

# by default trace goes to stdout
# redirect to a different file for each processes
sys.stdout = open('trace_{:04d}.txt'.format(COMM_WORLD.rank), 'w')

tracer.runfunc(my_program)

Now the trace of each process will be written in a separate file trace_0001.txt etc. Use ignoredirs and ignoremods arguments to omit low level calls.

like image 50
teekarna Avatar answered Sep 30 '22 15:09

teekarna