Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Python fork and memory allocation errors

Tags:

python

linux

fork

I have a memory intensive Python application (between hundreds of MB to several GB).
I have a couple of VERY SMALL Linux executables the main application needs to run, e.g.

child = Popen("make html", cwd = r'../../docs', stdout = PIPE, shell = True)
child.wait()

When I run these external utilities (once, at the end of the long main process run) using subprocess.Popen I sometimes get OSError: [Errno 12] Cannot allocate memory.
I don't understand why... The requested process is tiny!
The system has enough memory for many more shells.

I'm using Linux (Ubuntu 12.10, 64 bits), so I guess subprocess calls Fork.
And Fork forks my existing process, thus doubling the amount of memory consumed, and fails??
What happened to "copy on write"?

Can I spawn a new process without fork (or at least without copying memory - starting fresh)?

Related:

The difference between fork(), vfork(), exec() and clone()

fork () & memory allocation behavior

Python subprocess.Popen erroring with OSError: [Errno 12] Cannot allocate memory after period of time

Python memory allocation error using subprocess.Popen

like image 318
Tal Weiss Avatar asked Mar 15 '13 22:03

Tal Weiss


People also ask

How does fork work in Python?

fork() method in Python is used to create a child process. This method work by calling the underlying OS function fork(). This method returns 0 in the child process and child's process id in the parent process.

Does Popen use fork?

Popen is more portable (in particular, it works on Windows). It creates a child process, but you must specify another program that the child process should execute. On Unix, it is implemented by calling os. fork (to clone the parent process), then os.

Can Python use virtual memory?

The operating system (OS) abstracts the physical memory and creates a virtual memory layer that applications (including Python) can access. An OS-specific virtual memory manager carves out a chunk of memory for the Python process.

How much memory is allocated to Python?

The issue is that 32-bit python only has access to ~4GB of RAM. This can shrink even further if your operating system is 32-bit, because of the operating system overhead.


1 Answers

It doesn't appear that a real solution will be forthcoming (i.e. an alternate implementation of subprocess that uses vfork). So how about a cute hack? At the beginning of your process, spawn a slave that hangs around with a small memory footprint, ready to spawn your subprocesses, and keep open communication to it throughout the life of the main process.

Here's an example using rfoo (http://code.google.com/p/rfoo/) with a named unix socket called rfoosocket (you could obviously use other connection types rfoo supports, or another RPC library):

Server:

import rfoo
import subprocess

class MyHandler(rfoo.BaseHandler):
    def RPopen(self, cmd):
        c = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
        c.wait()
        return c.stdout.read()

rfoo.UnixServer(MyHandler).start('rfoosocket')

Client:

import rfoo

# Waste a bunch of memory before spawning the child. Swap out the RPC below
# for a straight popen to show it otherwise fails. Tweak to suit your
# available system memory.
mem = [x for x in range(100000000)]

c = rfoo.UnixConnection().connect('rfoosocket')

print rfoo.Proxy(c).RPopen('ls -l')

If you need real-time back and forth coprocess interaction with your spawned subprocesses this model probably won't work, but you might be able to hack it in. You'll presumably want to clean up the available args that can be passed to Popen based on your specific needs, but that should all be relatively straightforward.

You should also find it straightforward to launch the server at the start of the client, and to manage the socket file (or port) to be cleaned up on exit.

like image 107
Eric Angell Avatar answered Oct 20 '22 06:10

Eric Angell