I wanted to try different ways of using <code>multiprocessing</code> starting with this example: <pre class="prettyprint lang-py prettyprint-override"><code>$ cat multi_bad.py import multiprocessing as mp from time import sleep from random import randint def f(l, t): # sleep(30) return sum(x < t for x in l) if __name__ == '__main__': l = [randint(1, 1000) for _ in range(25000)] t = [randint(1, 1000) for _ in range(4)] # sleep(15) pool = mp.Pool(processes=4) result = pool.starmap_async(f, [(l, x) for x in t]) print(result.get()) </code></pre> Here, <code>l</code> is a list that gets copied 4 times when 4 processes are spawned. To avoid that, the documentation page offers using queues, shared arrays or proxy objects created using <code>multiprocessing.Manager</code>. For the last one, I changed the definition of <code>l</code>: <pre class="prettyprint"><code>$ diff multi_bad.py multi_good.py 10c10,11 < l = [randint(1, 1000) for _ in range(25000)] --- > man = mp.Manager() > l = man.list([randint(1, 1000) for _ in range(25000)]) </code></pre> The results still look correct, but the execution time has increased so dramatically that I think I'm doing something wrong: <pre class="prettyprint"><code>$ time python multi_bad.py [17867, 11103, 2021, 17918] real 0m0.247s user 0m0.183s sys 0m0.010s $ time python multi_good.py [3609, 20277, 7799, 24262] real 0m15.108s user 0m28.092s sys 0m6.320s </code></pre> The docs do say that this way is slower than shared arrays, but this just feels wrong. I'm also not sure how I can profile this to get more information on what's going on. Am I missing something? P.S. With shared arrays I get times below 0.25s. P.P.S. This is on Linux and Python 3.3.

Linux uses copy-on-write when subprocesses are <code>os.fork</code>ed. To demonstrate: <pre class="prettyprint"><code>import multiprocessing as mp import numpy as np import logging import os logger = mp.log_to_stderr(logging.WARNING) def free_memory(): total = 0 with open('/proc/meminfo', 'r') as f: for line in f: line = line.strip() if any(line.startswith(field) for field in ('MemFree', 'Buffers', 'Cached')): field, amount, unit = line.split() amount = int(amount) if unit != 'kB': raise ValueError( 'Unknown unit {u!r} in /proc/meminfo'.format(u = unit)) total += amount return total def worker(i): x = data[i,:].sum() # Exercise access to data logger.warn('Free memory: {m}'.format(m = free_memory())) def main(): procs = [mp.Process(target = worker, args = (i, )) for i in range(4)] for proc in procs: proc.start() for proc in procs: proc.join() logger.warn('Initial free: {m}'.format(m = free_memory())) N = 15000 data = np.ones((N,N)) logger.warn('After allocating data: {m}'.format(m = free_memory())) if __name__ == '__main__': main() </code></pre> which yielded <pre class="prettyprint"><code>[WARNING/MainProcess] Initial free: 2522340 [WARNING/MainProcess] After allocating data: 763248 [WARNING/Process-1] Free memory: 760852 [WARNING/Process-2] Free memory: 757652 [WARNING/Process-3] Free memory: 757264 [WARNING/Process-4] Free memory: 756760 </code></pre> This shows that initially there was roughly 2.5GB of free memory. After allocating a 15000x15000 array of <code>float64</code>s, there was 763248 KB free. This roughly makes sense since 15000**2*8 bytes = 1.8GB and the drop in memory, 2.5GB - 0.763248GB is also roughly 1.8GB. Now after each process is spawned, the free memory is again reported to be ~750MB. There is no significant decrease in free memory, so I conclude the system must be using copy-on-write. Conclusion: If you do not need to modify the data, defining it at the global level of the <code>__main__</code> module is a convenient and (at least on Linux) memory-friendly way to share it among subprocesses.

Using multiprocessing.Manager.list instead of a real list makes the calculation take ages

Tags:

python

multiprocessing

I wanted to try different ways of using multiprocessing starting with this example:

$ cat multi_bad.py 
import multiprocessing as mp
from time import sleep
from random import randint

def f(l, t):
#   sleep(30)
    return sum(x < t for x in l)

if __name__ == '__main__':
    l = [randint(1, 1000) for _ in range(25000)]
    t = [randint(1, 1000) for _ in range(4)]
#   sleep(15)
    pool = mp.Pool(processes=4)
    result = pool.starmap_async(f, [(l, x) for x in t])
    print(result.get())

Here, l is a list that gets copied 4 times when 4 processes are spawned. To avoid that, the documentation page offers using queues, shared arrays or proxy objects created using multiprocessing.Manager. For the last one, I changed the definition of l:

$ diff multi_bad.py multi_good.py 
10c10,11
<     l = [randint(1, 1000) for _ in range(25000)]
---
>     man = mp.Manager()
>     l = man.list([randint(1, 1000) for _ in range(25000)])

The results still look correct, but the execution time has increased so dramatically that I think I'm doing something wrong:

$ time python multi_bad.py 
[17867, 11103, 2021, 17918]

real    0m0.247s
user    0m0.183s
sys 0m0.010s

$ time python multi_good.py 
[3609, 20277, 7799, 24262]

real    0m15.108s
user    0m28.092s
sys 0m6.320s

The docs do say that this way is slower than shared arrays, but this just feels wrong. I'm also not sure how I can profile this to get more information on what's going on. Am I missing something?

P.S. With shared arrays I get times below 0.25s.

P.P.S. This is on Linux and Python 3.3.

220

asked Oct 29 '12 12:10

Lev Levitsky

2 Answers

Linux uses copy-on-write when subprocesses are os.forked. To demonstrate:

import multiprocessing as mp
import numpy as np
import logging
import os

logger = mp.log_to_stderr(logging.WARNING)

def free_memory():
    total = 0
    with open('/proc/meminfo', 'r') as f:
        for line in f:
            line = line.strip()
            if any(line.startswith(field) for field in ('MemFree', 'Buffers', 'Cached')):
                field, amount, unit = line.split()
                amount = int(amount)
                if unit != 'kB':
                    raise ValueError(
                        'Unknown unit {u!r} in /proc/meminfo'.format(u = unit))
                total += amount
    return total

def worker(i):
    x = data[i,:].sum()    # Exercise access to data
    logger.warn('Free memory: {m}'.format(m = free_memory()))

def main():
    procs = [mp.Process(target = worker, args = (i, )) for i in range(4)]
    for proc in procs:
        proc.start()
    for proc in procs:
        proc.join()

logger.warn('Initial free: {m}'.format(m = free_memory()))
N = 15000
data = np.ones((N,N))
logger.warn('After allocating data: {m}'.format(m = free_memory()))

if __name__ == '__main__':
    main()

which yielded

[WARNING/MainProcess] Initial free: 2522340
[WARNING/MainProcess] After allocating data: 763248
[WARNING/Process-1] Free memory: 760852
[WARNING/Process-2] Free memory: 757652
[WARNING/Process-3] Free memory: 757264
[WARNING/Process-4] Free memory: 756760

This shows that initially there was roughly 2.5GB of free memory. After allocating a 15000x15000 array of float64s, there was 763248 KB free. This roughly makes sense since 15000**2*8 bytes = 1.8GB and the drop in memory, 2.5GB - 0.763248GB is also roughly 1.8GB.

Now after each process is spawned, the free memory is again reported to be ~750MB. There is no significant decrease in free memory, so I conclude the system must be using copy-on-write.

Conclusion: If you do not need to modify the data, defining it at the global level of the __main__ module is a convenient and (at least on Linux) memory-friendly way to share it among subprocesses.

answered Sep 20 '22 13:09

unutbu

This is to be expected because accessing a shared objects means having to pickle the request send it through some kind of signal/syscall unpickle the request perform it and return the result in the same way.

Basically you should try to avoid sharing memory as much as you can. This leads to more debuggable code(because you have much less concurrency) and the speed up is greater.

Shared memory should only be used if really needed(e.g. sharing gigabytes of data so that copying it would require too much RAM or if the processes should be able to interact through this shared memory).

On a side note, probably using the Manager is much slower than a shared Array because the Manager must be able to handle any PyObject * and thus has to pickle/unpickle etc, while the arrays can avoid much of this overhead.

From the multiprocessing's documentation:

Managers provide a way to create data which can be shared between different processes. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.

So using a Manager means to spawn a new process that is used just to handle the shared memory, that's probably why it takes much more time.

If you try to profile the speed of the proxy it its a lot slower than a non-shared list:

>>> import timeit
>>> import multiprocessing as mp
>>> man = mp.Manager()
>>> L = man.list(range(25000))
>>> timeit.timeit('L[0]', 'from __main__ import L')
50.490395069122314
>>> L = list(range(25000))
>>> timeit.timeit('L[0]', 'from __main__ import L')
0.03588080406188965
>>> 50.490395069122314 / _
1407.1701119638526

While an Array is not so much slower:

>>> L = mp.Array('i', range(25000))
>>> timeit.timeit('L[0]', 'from __main__ import L')
0.6133401393890381
>>> 0.6133401393890381 / 0.03588080406188965
17.09382371507359

Since the very elementary operation are slow and don't think there's much hope to speed them up, this means that if you have to share a big list of data and want fast access to it then you ought to use an Array.

Something that might speed things up a bit is accessing more than one element at a time(e.g. getting slices instead of single elements), but depending on what you want to do this may or may not be possible.

answered Sep 21 '22 13:09

Bakuriu

Related questions
                            
                                Why doesn't recv block until it receives all of the data?
                            
                                How does import work with Boost.Python from inside python files
                            
                                How do you have shared log files under Windows?
                            
                                How can I find circular relations in a graph with Python and Networkx?
                            
                                Updating a matplotlib bar graph?
                            
                                Struggling with slice syntax to join list element of a part of a list
                            
                                How do I make processes able to write in an array of the main program?
                            
                                Taking a string of numbers and inserting + and - operators
                            
                                Plotting output of kmeans(PyCluster impl)
                            
                                How to create broken vertical bar graphs in matplotlib?
                            
                                How to find the screen size of two monitors using wx.displaySize()
                            
                                When is it best to use a class in Python?
                            
                                Histogram without plotting function
                            
                                OpenCV: Converting from NumPy to IplImage in Python
                            
                                How to turn off the ticks AND marks of a matlibplot axes?
                            
                                Usage of pypy compiler
                            
                                Ipdb and method documentation
                            
                                How can I save this matplotlib figure such that the x-axis labels are not cropped out?
                            
                                Django get ContentType in a template
                            
                                Proper use of `isinstance(obj, class)`

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With