Assume yo = Yo()
is a big object with a method double
, which returns its parameter multiplied by 2
.
If I pass yo.double
to imap
of multiprocessing
, then it is incredibly slow, because every function call creates a copy of yo
I think.
Ie, this is very slow:
from tqdm import tqdm
from multiprocessing import Pool
import numpy as np
class Yo:
def __init__(self):
self.a = np.random.random((10000000, 10))
def double(self, x):
return 2 * x
yo = Yo()
with Pool(4) as p:
for _ in tqdm(p.imap(yo.double, np.arange(1000))):
pass
Output:
0it [00:00, ?it/s]
1it [00:06, 6.54s/it]
2it [00:11, 6.17s/it]
3it [00:16, 5.60s/it]
4it [00:20, 5.13s/it]
...
BUT, if I wrap yo.double
with a function double_wrap
and pass it to imap
, then it is essentially instantaneous.
def double_wrap(x):
return yo.double(x)
with Pool(4) as p:
for _ in tqdm(p.imap(double_wrap, np.arange(1000))):
pass
Output:
0it [00:00, ?it/s]
1000it [00:00, 14919.34it/s]
How and why does wrapping the function change the behavior?
I use Python 3.6.6.
You are right about the copying. yo.double
is a 'bound method', bound to your big object. When you pass it into the pool-method, it will pickle the whole instance with it, send it to the child processes and unpickle it there. This happens for every chunk of the iterable a child process works on. The default value for chunksize
in pool.imap
is 1, so you are hitting this communication overhead for every processed item in the iterable.
Contrary when you pass double_wrap
, you are just passing a module-level function. Only it's name will actually get pickled and the child processes will import the function from __main__
. Since you're obviously on an OS which supports forking, your double_wrap
function will have access to the forked yo
instance of Yo
. Your big object won't be serialized (pickled) in this case, hence the communication overhead is tiny compared to the other approach.
@Darkonaut I just don't understand why making the function module level prevents copying of the object. After all, the function needs to have a pointer to the yo object itself – which should require all processes to copy yo as they cannot share memory.
The function running in the child process will automatically find a reference to a global yo
, because your operating system (OS) is using fork to create a child process. Forking leads to a clone of your whole parent process and as long neither the parent nor the child alter a specific object, both will see the same object in the same memory place.
Only if parent or child change something on the object, the object get's copied in the child process. That's called "copy-on-write" and happens at OS level without you taking notice of it in Python. Your code wouldn't work on Windows, which uses 'spawn' as start method for new processes.
Now I'm simplifying a bit above where I write "the object gets copied", since the unit the OS operates on is a "page" (most commonly this will be of size 4KB). This answer here would be a good follow up read for broading your understanding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With