I've noticed a huge delay when using multiprocessing (with joblib). Here is a simplified version of my code:
import numpy as np
from joblib import Parallel, delayed
class Matcher(object):
def match_all(self, arr1, arr2):
args = ((elem1, elem2) for elem1 in arr1 for elem2 in arr2)
results = Parallel(n_jobs=-1)(delayed(_parallel_match)(self, e1, e2) for e1, e2 in args)
# ...
def match(self, i1, i2):
return i1 == i2
def _parallel_match(m, i1, i2):
return m.match(i1, i2)
matcher = Matcher()
matcher.match_all(np.ones(250), np.ones(250))
So if I run it like shown above, it takes about 30 secs to complete and use almost 200Mb. If I just change the parameter n_jobs in Parallel and set it to 1 it only takes 1.80 secs and barely use 50Mb...
I suppose it has to be something related to the way I pass the arguments, but haven't found a better way to do it...
I'm using Python 2.7.9
Old multiprocessing backendPrior to version 0.12, joblib used the 'multiprocessing' backend as default backend instead of 'loky' . This backend creates an instance of multiprocessing. Pool that forks the Python interpreter in multiple processes to execute each of the items of the list.
TL;DR - it preserves order for both backends.
Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.
joblib. dump() and joblib. load() provide a replacement for pickle to work efficiently on arbitrary Python objects containing large data, in particular large numpy arrays.
I have re-written the code without using joblib library and now it works like it is supposed to work, although not so "beautiful" code:
import itertools
import multiprocessing
import numpy as np
class Matcher(object):
def match_all(self, a1, a2):
args = ((elem1, elem2) for elem1 in a1 for elem2 in a2)
args = zip(itertools.repeat(self), args)
pool = multiprocessing.Pool()
results = np.fromiter(pool.map(_parallel_match, args))
# ...
def match(self, i1, i2):
return i1 == i2
def _parallel_match(*args):
return args[0][0].match(*args[0][1:][0])
matcher = Matcher()
matcher.match_all(np.ones(250), np.ones(250))
This version works like a charm, and takes only 0.58 secs to complete...
So, why isn't it working at all with joblib? Can't really get to understand it, but I guess joblib is making copies of the whole array for every single process...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With