Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing a method of a big object to imap: 1000-fold speed-up by wrapping the method

Assume yo = Yo() is a big object with a method double, which returns its parameter multiplied by 2.

If I pass yo.double to imap of multiprocessing, then it is incredibly slow, because every function call creates a copy of yo I think.

Ie, this is very slow:

from tqdm import tqdm
from multiprocessing import Pool
import numpy as np


class Yo:
    def __init__(self):
        self.a = np.random.random((10000000, 10))

    def double(self, x):
        return 2 * x

yo = Yo()    

with Pool(4) as p:
    for _ in tqdm(p.imap(yo.double, np.arange(1000))):
        pass

Output:

0it [00:00, ?it/s]
1it [00:06,  6.54s/it]
2it [00:11,  6.17s/it]
3it [00:16,  5.60s/it]
4it [00:20,  5.13s/it]

...

BUT, if I wrap yo.double with a function double_wrap and pass it to imap, then it is essentially instantaneous.

def double_wrap(x):
    return yo.double(x)

with Pool(4) as p:
    for _ in tqdm(p.imap(double_wrap, np.arange(1000))):
        pass

Output:

0it [00:00, ?it/s]
1000it [00:00, 14919.34it/s]

How and why does wrapping the function change the behavior?

I use Python 3.6.6.

like image 678
erensezener Avatar asked Nov 07 '22 01:11

erensezener


1 Answers

You are right about the copying. yo.double is a 'bound method', bound to your big object. When you pass it into the pool-method, it will pickle the whole instance with it, send it to the child processes and unpickle it there. This happens for every chunk of the iterable a child process works on. The default value for chunksize in pool.imap is 1, so you are hitting this communication overhead for every processed item in the iterable.

Contrary when you pass double_wrap, you are just passing a module-level function. Only it's name will actually get pickled and the child processes will import the function from __main__. Since you're obviously on an OS which supports forking, your double_wrap function will have access to the forked yo instance of Yo. Your big object won't be serialized (pickled) in this case, hence the communication overhead is tiny compared to the other approach.


@Darkonaut I just don't understand why making the function module level prevents copying of the object. After all, the function needs to have a pointer to the yo object itself – which should require all processes to copy yo as they cannot share memory.

The function running in the child process will automatically find a reference to a global yo, because your operating system (OS) is using fork to create a child process. Forking leads to a clone of your whole parent process and as long neither the parent nor the child alter a specific object, both will see the same object in the same memory place.

Only if parent or child change something on the object, the object get's copied in the child process. That's called "copy-on-write" and happens at OS level without you taking notice of it in Python. Your code wouldn't work on Windows, which uses 'spawn' as start method for new processes.

Now I'm simplifying a bit above where I write "the object gets copied", since the unit the OS operates on is a "page" (most commonly this will be of size 4KB). This answer here would be a good follow up read for broading your understanding.

like image 164
Darkonaut Avatar answered Nov 14 '22 22:11

Darkonaut