I have a memory intensive Python application (between hundreds of MB to several GB).
I have a couple of VERY SMALL Linux executables the main application needs to run, e.g.
child = Popen("make html", cwd = r'../../docs', stdout = PIPE, shell = True)
child.wait()
When I run these external utilities (once, at the end of the long main process run) using subprocess.Popen
I sometimes get OSError: [Errno 12] Cannot allocate memory
.
I don't understand why... The requested process is tiny!
The system has enough memory for many more shells.
I'm using Linux (Ubuntu 12.10, 64 bits), so I guess subprocess calls Fork.
And Fork forks my existing process, thus doubling the amount of memory consumed, and fails??
What happened to "copy on write"?
Can I spawn a new process without fork (or at least without copying memory - starting fresh)?
Related:
The difference between fork(), vfork(), exec() and clone()
fork () & memory allocation behavior
Python subprocess.Popen erroring with OSError: [Errno 12] Cannot allocate memory after period of time
Python memory allocation error using subprocess.Popen
fork() method in Python is used to create a child process. This method work by calling the underlying OS function fork(). This method returns 0 in the child process and child's process id in the parent process.
Popen is more portable (in particular, it works on Windows). It creates a child process, but you must specify another program that the child process should execute. On Unix, it is implemented by calling os. fork (to clone the parent process), then os.
The operating system (OS) abstracts the physical memory and creates a virtual memory layer that applications (including Python) can access. An OS-specific virtual memory manager carves out a chunk of memory for the Python process.
The issue is that 32-bit python only has access to ~4GB of RAM. This can shrink even further if your operating system is 32-bit, because of the operating system overhead.
It doesn't appear that a real solution will be forthcoming (i.e. an alternate implementation of subprocess that uses vfork). So how about a cute hack? At the beginning of your process, spawn a slave that hangs around with a small memory footprint, ready to spawn your subprocesses, and keep open communication to it throughout the life of the main process.
Here's an example using rfoo (http://code.google.com/p/rfoo/) with a named unix socket called rfoosocket (you could obviously use other connection types rfoo supports, or another RPC library):
Server:
import rfoo
import subprocess
class MyHandler(rfoo.BaseHandler):
def RPopen(self, cmd):
c = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
c.wait()
return c.stdout.read()
rfoo.UnixServer(MyHandler).start('rfoosocket')
Client:
import rfoo
# Waste a bunch of memory before spawning the child. Swap out the RPC below
# for a straight popen to show it otherwise fails. Tweak to suit your
# available system memory.
mem = [x for x in range(100000000)]
c = rfoo.UnixConnection().connect('rfoosocket')
print rfoo.Proxy(c).RPopen('ls -l')
If you need real-time back and forth coprocess interaction with your spawned subprocesses this model probably won't work, but you might be able to hack it in. You'll presumably want to clean up the available args that can be passed to Popen based on your specific needs, but that should all be relatively straightforward.
You should also find it straightforward to launch the server at the start of the client, and to manage the socket file (or port) to be cleaned up on exit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With