Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the memory usage of a child (python multiprocessing) process so different when sharing a ctypes.Structure with a string vs. only a string?

The following code uses multiprocessing's Array to share a large array of unicode strings across processes. If I use c_wchar_p as the type, the child process' memory usage is about one quarter of memory used in the parent process (the amount changes if I change the amount of entries in the Array).

However, if I use a ctypes.Structure with a single c_wchar_p field the child process' memory usage is constant and very low while the parent process' memory usage doubles.

import ctypes
import multiprocessing
import random
import resource
import time

a = None

class Record(ctypes.Structure):
    _fields_ = [('value', ctypes.c_wchar_p)]
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return '(%s)' % (self.value,)

def child(i):
    while True:
        print "%ik memory used in child %i: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, i, a[i])
        time.sleep(1)
        for j in xrange(len(a)):
            c = a[j]

def main():
    global a
    # uncomment this line and comment the next to switch
    #a = multiprocessing.Array(ctypes.c_wchar_p, [u'unicode %r!' % i for i in xrange(1000000)], lock=False)
    a = multiprocessing.Array(Record, [Record(u'unicode %r!' % i) for i in xrange(1000000)], lock=False)
    for i in xrange(5):
        p = multiprocessing.Process(target=child, args=(i + 1,))
        p.start()
    while True:
        print "%ik memory used in parent: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, a[0])
        time.sleep(1)

if __name__ == '__main__':
    main()

Using c_wchar_p results in this output:

363224k memory used in parent: unicode 0!
72560k memory used in child 5: unicode 5!
72556k memory used in child 3: unicode 3!
72536k memory used in child 1: unicode 1!
72568k memory used in child 4: unicode 4!
72576k memory used in child 2: unicode 2!

Using Record results in this output:

712508k memory used in parent: (unicode 0!)
1912k memory used in child 1: (unicode 1!)
1908k memory used in child 2: (unicode 2!)
1904k memory used in child 5: (unicode 5!)
1904k memory used in child 4: (unicode 4!)
1908k memory used in child 3: (unicode 3!)

Why?

like image 661
papercrane Avatar asked Feb 16 '12 04:02

papercrane


People also ask

Is memory shared in multiprocessing?

shared_memory — Shared memory for direct access across processes. New in version 3.8. This module provides a class, SharedMemory , for the allocation and management of shared memory to be accessed by one or more processes on a multicore or symmetric multiprocessor (SMP) machine.

Do Python processes share memory?

Shared memory can be a very efficient way of handling data in a program that uses concurrency. Python's mmap uses shared memory to efficiently share large amounts of data between multiple Python processes, threads, and tasks that are happening concurrently.

Which is the method used to change the default way to create child processes in multiprocessing?

The possible start methods are 'fork', 'spawn' and 'forkserver'. On Windows only 'spawn' is available. On Unix 'fork' and 'spawn' are always supported, with 'fork' being the default.

How do I share data between two processes in Python?

Passing Messages to Processes A simple way to communicate between process with multiprocessing is to use a Queue to pass messages back and forth. Any pickle-able object can pass through a Queue. This short example only passes a single message to a single worker, then the main process waits for the worker to finish.


1 Answers

I don't know about the increase in memory usage but I don't think it is really doing what you intend to do.

If you modify a[i] in your parent process, the child processes don't get the same value.

It's best not to pass pointers (which is exactly what the _p types are) between processes. As quoted from multiprocessing docs:

Although it is possible to store a pointer in shared memory remember that this will refer to a location in the address space of a specific process. However, the pointer is quite likely to be invalid in the context of a second process and trying to dereference the pointer from the second process may cause a crash.

like image 110
Nam Nguyen Avatar answered Nov 15 '22 04:11

Nam Nguyen