Specifically, I'm running a Flask app with default workers in gunicorn. I'm trying to figure out how to debug / trace what is happening when a worker is killed due to timeout while serving a request. Is there a way to get a stack trace or profile the request to debug this?
Gunicorn documentation about timeout Generally set to thirty seconds. Only set this noticeably higher if you're sure of the repercussions for sync workers. For the non sync workers it just means that the worker process is still communicating and is not tied to the length of time required to handle a single request.
¶ Since Gunicorn 19, a threads option can be used to process requests in multiple threads. Using threads assumes use of the gthread worker.
In version 19.0, Gunicorn doesn't log by default in the console. To watch the logs in the console you need to use the option --log-file=- . In version 19.2, Gunicorn logs to the console by default again.
Honestly a very good question, I was not sure if it is even possible to achieve the same or not. But as I started digging I found lot of interesting threads
Showing the stack trace from a running Python application
Get stacktrace from stuck python process
https://github.com/khamidou/lptrace
https://gist.github.com/reywood/e221c4061bbf2eccea885c9b2e4ef496
So first I created a simple flask app with below code
app.py
from flask import Flask
app = Flask(__name__)
import time
def a():
b()
def b():
c ()
def c():
i = 0
while i < 900:
time.sleep(1)
i += 1
@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def catch_all(path):
a()
return 'You want path: %s' % path
if __name__ == '__main__':
app.run()
wsgi.py
from app import app
if __name__ == "__main__":
app.run()
Now running the app like below and doing curl localhost:8000/abc
$ gunicorn wsgi:app
[2019-08-01 08:19:06 +0000] [26825] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:19:06 +0000] [26825] [INFO] Listening at: http://127.0.0.1:8000 (26825)
[2019-08-01 08:19:06 +0000] [26825] [INFO] Using worker: sync
[2019-08-01 08:19:06 +0000] [26828] [INFO] Booting worker with pid: 26828
[2019-08-01 08:19:40 +0000] [26825] [CRITICAL] WORKER TIMEOUT (pid:26828)
[2019-08-01 08:19:40 +0000] [26828] [INFO] Worker exiting (pid: 26828)
[2019-08-01 08:19:40 +0000] [26832] [INFO] Booting worker with pid: 26832
Now what we need is a hook which can be called before the worked is killed. gunicorn
supports server events in the configuration file
So now we create a config file
gunicorn_config.py
timeout = 3
def worker_abort(worker):
pid = worker.pid
print("worker is being killed - {}".format(pid))
And our output is now
$ gunicorn -c gunicorn_config.py wsgi:app
[2019-08-01 08:22:17 +0000] [26837] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:22:17 +0000] [26837] [INFO] Listening at: http://127.0.0.1:8000 (26837)
[2019-08-01 08:22:17 +0000] [26837] [INFO] Using worker: sync
[2019-08-01 08:22:17 +0000] [26840] [INFO] Booting worker with pid: 26840
[2019-08-01 08:22:22 +0000] [26837] [CRITICAL] WORKER TIMEOUT (pid:26840)
worker is being killed - 26840
[2019-08-01 08:22:22 +0000] [26840] [INFO] Worker exiting (pid: 26840)
[2019-08-01 08:22:22 +0000] [26844] [INFO] Booting worker with pid: 26844
This is good, now we need mix our previous knowledge on pyrasite
and this to get the stack. So we updated the config file like below
gunicorn_config.py
timeout = 3
__code_dump_stack__ = """
import sys, traceback
for thread, frame in sys._current_frames().items():
print('Thread 0x%x' % thread)
traceback.print_stack(frame)
print()
"""
def dump_stack_for_process(pid):
import pyrasite
ipc = pyrasite.PyrasiteIPC(pid)
ipc.connect()
print(ipc.cmd(__code_dump_stack__))
ipc.close()
def worker_abort(worker):
pid = worker.pid
print("worker is being killed - {}".format(pid))
dump_stack_for_process(pid)
And now our output is
$ [2019-08-01 08:25:29 +0000] [26848] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:25:29 +0000] [26848] [INFO] Listening at: http://127.0.0.1:8000 (26848)
[2019-08-01 08:25:29 +0000] [26848] [INFO] Using worker: sync
[2019-08-01 08:25:29 +0000] [26851] [INFO] Booting worker with pid: 26851
[2019-08-01 08:25:38 +0000] [26848] [CRITICAL] WORKER TIMEOUT (pid:26851)
worker is being killed - 26851
Thread 0x7ff0a7a4b700
File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "<string>", line 72, in run
File "<string>", line 92, in on_command
File "<string>", line 6, in <module>
Thread 0x7ff0ac512700
File "/home/vagrant/.local/bin/gunicorn", line 11, in <module>
sys.exit(run())
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 223, in run
super(Application, self).run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 203, in run
self.manage_workers()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
self.spawn_workers()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
self.spawn_worker()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 124, in run
self.run_for_one(timeout)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 68, in run_for_one
self.accept(listener)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 30, in accept
self.handle(listener, client, addr)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, in handle
self.handle_request(listener, req, client, addr)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/vagrant/remotedebug/app.py", line 20, in catch_all
a()
File "/home/vagrant/remotedebug/app.py", line 6, in a
b()
File "/home/vagrant/remotedebug/app.py", line 9, in b
c ()
File "/home/vagrant/remotedebug/app.py", line 14, in c
time.sleep(1)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 195, in handle_abort
self.cfg.worker_abort(self)
File "gunicorn_config.py", line 23, in worker_abort
dump_stack_for_process(pid)
File "gunicorn_config.py", line 17, in dump_stack_for_process
print(ipc.cmd(__code_dump_stack__))
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 161, in cmd
return self.recv()
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 174, in recv
header_data = self.recv_bytes(4)
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 187, in recv_bytes
chunk = self.sock.recv(n - len(data))
[2019-08-01 08:25:38 +0000] [26851] [INFO] Worker exiting (pid: 26851)
[2019-08-01 08:25:38 +0000] [26862] [INFO] Booting worker with pid: 26862
The stack trace is large but it gives us what we need
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With