After researching python daemons, this walk through seemed to be the most robust: http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
Now I am trying to implement a pool of workers inside the daemon class which I believe is working (I have not thoroughly tested the code) except that on the close I get a zombie process. I have read I need to wait for the return code from the child but I just cannot see exactly how I need to do this yet.
Here are some code snippets:
def stop(self):
...
try:
while 1:
self.pool.close()
self.pool.join()
os.kill(pid, SIGTERM)
time.sleep(0.1)
...
Here I have tried os.killpg
and a number of os.wait
methods but with no improvement. I also have played with closing
/joining
the pool before and after the os.kill
. This loop as it stands, never ends and as soon as it hits the os.kill
I get a zombie process. self.pool = Pool(processes=4)
occurs in the __init__
section of the daemon. From the run(self)
which is excecuted after start(self)
, I will call self.pool.apply_async(self.runCmd, [cmd, 10], callback=self.logOutput)
. However, I wanted to address this zombie process before looking into that.
How can I properly implement the pool inside the daemon to avoid this zombie process?
It is not possible to have 100% confidence in an answer without knowing what is going on in the child/daemon process, but consider if this could be it. Since you have worker threads in your child process, you actually need to build in some logic to join all of those threads once you receive the SIGTERM. Otherwise your process may not exit (and even if it does you may not exit gracefully). To do this you need to:
If you have threads for I/O and all kinds of things then this will be a real chore.
Also, I have found through experiment that the particular strategy for your event listener matters when you are using signal handlers. For example, if you use select.select() you must use a time-out and retry if the time-out occurs; otherwise your signal handler will not run. If you have a Queue.Queue object for events, and your event listener calls its .get() method, you must use a timeout, otherwise your signal handler will not run. (The "real" signal handler implemented in C within the VM runs, but your Python signal handler doesn't unless you use timeouts.)
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With