Debugging parallel Python programs (mpi4py)

I have an mpi4py program that hangs intermittently. How can I trace what the individual processes are doing?

I can run the program in different terminals, for example using pdb

mpiexec -n 4 xterm -e "python -m pdb my_program.py"

But this gets cumbersome if the issue only manifests with a large number of processes (~80 in my case). In addition, it's easy to catch exceptions with pdb but I'd need to see the trace to figure out where the hang occurs.

How do you debug a Python program?

Starting Python Debugger To start debugging within the program just insert import pdb, pdb. set_trace() commands. Run your script normally, and execution will stop where we have introduced a breakpoint. So basically we are hard coding a breakpoint on a line below where we call set_trace().

Can you run Python in debug mode?

Basic debugging# If you're only interested in debugging a Python script, the simplest way is to select the down-arrow next to the run button on the editor and select Debug Python File in Terminal.

Is Python difficult to debug?

But when it comes to Python, debugging “out of the box” is a little bit cruder and primitive; single-step debugging is the main way to debug Python code, and is quite slow and clunky. It's just easier to use print statements; Python founder Guido van Rossum (reportedly) uses them for 90 percent of his debugging.

The Python trace module allows you to trace program execution. In order to store the trace of each process separately, you need to wrap your code in a function:

def my_program(*args, **kwargs):
    # insert your code here
    pass

And then run it with trace.Trace.runfunc:

import sys
import trace

# define Trace object: trace line numbers at runtime, exclude some modules
tracer = trace.Trace(
    ignoredirs=[sys.prefix, sys.exec_prefix],
    ignoremods=[
        'inspect', 'contextlib', '_bootstrap',
        '_weakrefset', 'abc', 'posixpath', 'genericpath', 'textwrap'
    ],
    trace=1,
    count=0)

# by default trace goes to stdout
# redirect to a different file for each processes
sys.stdout = open('trace_{:04d}.txt'.format(COMM_WORLD.rank), 'w')

tracer.runfunc(my_program)

Now the trace of each process will be written in a separate file trace_0001.txt etc. Use ignoredirs and ignoremods arguments to omit low level calls.

Debugging parallel Python programs (mpi4py)

Tags:

python

debugging

parallel-processing

trace

mpi4py

teekarna

People also ask

1 Answers

teekarna

Recent Activity

Donate For Us

Debugging parallel Python programs (mpi4py)

Tags:

python

debugging

parallel-processing

trace

mpi4py

teekarna

People also ask

1 Answers

teekarna

Related questions

Recent Activity

Donate For Us