I have an mpi4py
program that hangs intermittently. How can I trace what the individual processes are doing?
I can run the program in different terminals, for example using pdb
mpiexec -n 4 xterm -e "python -m pdb my_program.py"
But this gets cumbersome if the issue only manifests with a large number of processes (~80 in my case). In addition, it's easy to catch exceptions with pdb
but I'd need to see the trace to figure out where the hang occurs.
Starting Python Debugger To start debugging within the program just insert import pdb, pdb. set_trace() commands. Run your script normally, and execution will stop where we have introduced a breakpoint. So basically we are hard coding a breakpoint on a line below where we call set_trace().
Basic debugging# If you're only interested in debugging a Python script, the simplest way is to select the down-arrow next to the run button on the editor and select Debug Python File in Terminal.
But when it comes to Python, debugging “out of the box” is a little bit cruder and primitive; single-step debugging is the main way to debug Python code, and is quite slow and clunky. It's just easier to use print statements; Python founder Guido van Rossum (reportedly) uses them for 90 percent of his debugging.
The Python trace module allows you to trace program execution. In order to store the trace of each process separately, you need to wrap your code in a function:
def my_program(*args, **kwargs):
# insert your code here
pass
And then run it with trace.Trace.runfunc
:
import sys
import trace
# define Trace object: trace line numbers at runtime, exclude some modules
tracer = trace.Trace(
ignoredirs=[sys.prefix, sys.exec_prefix],
ignoremods=[
'inspect', 'contextlib', '_bootstrap',
'_weakrefset', 'abc', 'posixpath', 'genericpath', 'textwrap'
],
trace=1,
count=0)
# by default trace goes to stdout
# redirect to a different file for each processes
sys.stdout = open('trace_{:04d}.txt'.format(COMM_WORLD.rank), 'w')
tracer.runfunc(my_program)
Now the trace of each process will be written in a separate file trace_0001.txt
etc. Use ignoredirs
and ignoremods
arguments to omit low level calls.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With