I'm trying to build a Python daemon that launches other fully independent processes.
The general idea is for a given shell command, poll every few seconds and ensure that exactly k instances of the command are running. We keep a directory of pidfiles, and when we poll we remove pidfiles whose pids are no longer running and start up (and make pidfiles for) however many processes we need to get to k of them.
The child processes also need to be fully independent, so that if the parent process dies the children won't be killed. From what I've read, it seems there is no way to do this with the subprocess
module. To this end, I used the snippet mentioned here:
http://code.activestate.com/recipes/66012-fork-a-daemon-process-on-unix/
I made a couple necessary modifications (you'll see the lines commented out in the attached snippet):
Here's my spawn fn and a test:
import os
import sys
import subprocess
import time
def spawn(cmd, child_cwd):
"""
do the UNIX double-fork magic, see Stevens' "Advanced
Programming in the UNIX Environment" for details (ISBN 0201563177)
http://www.erlenstar.demon.co.uk/unix/faq_2.html#SEC16
"""
try:
pid = os.fork()
if pid > 0:
# exit first parent
#sys.exit(0) # parent daemon needs to stay alive to launch more in the future
return
except OSError, e:
sys.stderr.write("fork #1 failed: %d (%s)\n" % (e.errno, e.strerror))
sys.exit(1)
# decouple from parent environment
#os.chdir("/") # we want the children processes to
os.setsid()
os.umask(0)
# do second fork
try:
pid = os.fork()
if pid > 0:
# exit from second parent
sys.exit(0)
except OSError, e:
sys.stderr.write("fork #2 failed: %d (%s)\n" % (e.errno, e.strerror))
sys.exit(1)
# redirect standard file descriptors
sys.stdout.flush()
sys.stderr.flush()
si = file('/dev/null', 'r')
so = file('/dev/null', 'a+')
se = file('/dev/null', 'a+', 0)
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())
pid = subprocess.Popen(cmd, cwd=child_cwd, shell=True).pid
# write pidfile
with open('pids/%s.pid' % pid, 'w') as f: f.write(str(pid))
sys.exit(1)
def mkdir_if_none(path):
if not os.access(path, os.R_OK):
os.mkdir(path)
if __name__ == '__main__':
try:
cmd = sys.argv[1]
num = int(sys.argv[2])
except:
print 'Usage: %s <cmd> <num procs>' % __file__
sys.exit(1)
mkdir_if_none('pids')
mkdir_if_none('test_cwd')
for i in xrange(num):
print 'spawning %d...'%i
spawn(cmd, 'test_cwd')
time.sleep(0.01) # give the system some breathing room
In this situation, things seem to work fine, and the child processes persist even when the parent is killed. However, I'm still running into a spawn limit on the original parent. After ~650 spawns (not concurrently, the children have finished) the parent process chokes with the error:
spawning 650...
fork #2 failed: 35 (Resource temporarily unavailable)
Is there any way to rewrite my spawn function so that I can spawn these independent child processes indefinitely? Thanks!
Thanks to your list of processes I'm willing to say that this is because you have hit one of a number of fundamental limitations:
nproc
maximum number of processes a given user is allowed to execute -- see setrlimit(2)
, the bash(1)
ulimit
built-in, and /etc/security/limits.conf
for details on per-user process limits.nofile
maximum number of file descriptors a given process is allowed to have open at once. (Each new process probably creates three new pipes in the parent, for the child's stdin
, stdout
, and stderr
descriptors.)/proc/sys/kernel/pid_max
./proc/sys/fs/file-max
.Because you're not reaping your dead children, many of these resources are held open longer than they should. Your second children are being properly handled by init(8)
-- their parent is dead, so they are re-parented to init(8)
, and init(8)
will clean up after them (wait(2)
) when they die.
However, your program is responsible for cleaning up after the first set of children. C programs typically install a signal(7)
handler for SIGCHLD
that calls wait(2)
or waitpid(2)
to reap the children's exit status and thus remove its entries from the kernel's memory.
But signal handling in a script is a bit annoying. If you can set the SIGCHLD
signal disposition to SIG_IGN
explicitly, the kernel will know that you are not interested in the exit status and will reap the children for you_.
Try adding:
import signal
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
near the top of your program.
Note that I don't know what this does for Subprocess
. It might not be pleased. If that is the case, then you'll need to install a signal handler to call wait(2)
for you.
I'm slightly modified your code and was able to run 5000 processes without any issues. So I agree with @sarnold that you hit some fundamental limitation. My modifications are:
proc = subprocess.Popen(cmd, cwd=child_cwd, shell=True, close_fds=True)
pid = proc.pid
# write pidfile
with open('pids/%s.pid' % pid, 'w') as f: f.write(str(pid))
proc.wait()
sys.exit(1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With