Passing a method of a big object to imap: 1000-fold speed-up by wrapping the method

Question

Assume yo = Yo() is a big object with a method double, which returns its parameter multiplied by 2.

If I pass yo.double to imap of multiprocessing, then it is incredibly slow, because every function call creates a copy of yo I think.

Ie, this is very slow:

from tqdm import tqdm
from multiprocessing import Pool
import numpy as np


class Yo:
    def __init__(self):
        self.a = np.random.random((10000000, 10))

    def double(self, x):
        return 2 * x

yo = Yo()    

with Pool(4) as p:
    for _ in tqdm(p.imap(yo.double, np.arange(1000))):
        pass

Output:

0it [00:00, ?it/s]
1it [00:06,  6.54s/it]
2it [00:11,  6.17s/it]
3it [00:16,  5.60s/it]
4it [00:20,  5.13s/it]

...

BUT, if I wrap yo.double with a function double_wrap and pass it to imap, then it is essentially instantaneous.

def double_wrap(x):
    return yo.double(x)

with Pool(4) as p:
    for _ in tqdm(p.imap(double_wrap, np.arange(1000))):
        pass

Output:

0it [00:00, ?it/s]
1000it [00:00, 14919.34it/s]

How and why does wrapping the function change the behavior?

I use Python 3.6.6.

Darkonaut · Accepted Answer

You are right about the copying. yo.double is a 'bound method', bound to your big object. When you pass it into the pool-method, it will pickle the whole instance with it, send it to the child processes and unpickle it there. This happens for every chunk of the iterable a child process works on. The default value for chunksize in pool.imap is 1, so you are hitting this communication overhead for every processed item in the iterable.

Contrary when you pass double_wrap, you are just passing a module-level function. Only it's name will actually get pickled and the child processes will import the function from __main__. Since you're obviously on an OS which supports forking, your double_wrap function will have access to the forked yo instance of Yo. Your big object won't be serialized (pickled) in this case, hence the communication overhead is tiny compared to the other approach.

@Darkonaut I just don't understand why making the function module level prevents copying of the object. After all, the function needs to have a pointer to the yo object itself – which should require all processes to copy yo as they cannot share memory.

The function running in the child process will automatically find a reference to a global yo, because your operating system (OS) is using fork to create a child process. Forking leads to a clone of your whole parent process and as long neither the parent nor the child alter a specific object, both will see the same object in the same memory place.

Only if parent or child change something on the object, the object get's copied in the child process. That's called "copy-on-write" and happens at OS level without you taking notice of it in Python. Your code wouldn't work on Windows, which uses 'spawn' as start method for new processes.

Now I'm simplifying a bit above where I write "the object gets copied", since the unit the OS operates on is a "page" (most commonly this will be of size 4KB). This answer here would be a good follow up read for broading your understanding.

Passing a method of a big object to imap: 1000-fold speed-up by wrapping the method

Tags:

python

parallel-processing

multiprocessing

erensezener

1 Answers

Darkonaut

Recent Activity

Donate For Us

Passing a method of a big object to imap: 1000-fold speed-up by wrapping the method

Tags:

python

parallel-processing

multiprocessing

erensezener

1 Answers

Darkonaut

Related questions

Recent Activity

Donate For Us