I have to run a legacy Zope2 website and have some grievance with it. The biggest issue is that, occasionally, it just locks up, running at 100% CPU load and not answering to requests anymore. While the problem isn't reproducible on a regular basis, one page containing 3 dynamic graphs triggers it sometimes, so I suspect some kind of race condition that leads to an endless loop or a stuck busywait.
The problem is, I have not yet found a way to debug this thing. There's nothing in the Zope logs and nothing in the system logs. I tried the suggestions from this question to get a stacktrace, but the only signal that has any effect is SIGKILL
.
Is there another possibility to find out where exactly the process is when it gets stuck?
You can print out a nice stack trace using pyrasite.
First, you'll need to have gdb installed.
# Redhat, CentOS, etc
$ yum install gdb
# Ubuntu, Debian, etc
$ apt-get update && apt-get install gdb
Then, install pyrasite.
$ pip install pyrasite
Use ps
or some other method to find the process ID for the stuck python process and run pyrasite-shell
with it.
# Assuming process ID is 12345
$ pyrasite-shell 12345
You should now see a python REPL. Run the following in the REPL to see stack traces for all threads.
import sys, traceback
for thread_id, frame in sys._current_frames().items():
print 'Stack for thread {}'.format(thread_id)
traceback.print_stack(frame)
print ''
See my answer to this SO question, use Products.signalstack. It registers the same handler as the answer you already found, at Product registration time. Perhaps it works better for you.
If not, you probably have a OS-level I/O problem on your hands, and your only hope is attaching gdb to the process. Search Stack Overflow for gdb answers; there is a wealth of information here!
While pyrasite might work, it does not handle some corner cases and hang/fail silently.
If the package does not work as expected, it's possible to do what the package does under the hood manually to figure out what went wrong.
gdb -p <PID>
(may need sudo
.)set $gstate = PyGILState_Ensure()
call PyRun_SimpleString(" <some Python code> ")
call PyGILState_Release($gstate)
See Python API documentation for the functions: 1 2.
In case Python is not compiled with debug symbols, it's necessary to provide the explicit data types for the functions:
Refer to the Python source code https://github.com/python/cpython/blob/4fe5585240f64c3d14eb635ff82b163f92074b3a/Include/pystate.h#L86-L88 , the type PyGILState_STATE
is an enum with 2 values, so we "guess" that we can use int
. (although it may not work.)
In conclusion, according to the documentation, the "correct (subject to the restriction above)" commands for the functions are
set $gstate = ((int (*)()) PyGILState_Ensure ) ()
call ((int (*)(const char*)) PyRun_SimpleString) (" <some Python code> ")
call ((void(*)(int)) PyGILState_Release) ($gstate)
This solution does not rely on the Python-debugging extension for gdb. Otherwise it's possible to simply run py-bt
.
I have a more up-to-date fork of pyrasite, (currently) named pyrasite-ng. If there's any bug it can be reported there, hopefully I can fix it quickly.
You could try to attach a debugger to the running process. See also this question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With