Python multiprocessing (joblib) best way for argument passing

Tags:

I've noticed a huge delay when using multiprocessing (with joblib). Here is a simplified version of my code:

import numpy as np
from joblib import Parallel, delayed

class Matcher(object):
    def match_all(self, arr1, arr2):
        args = ((elem1, elem2) for elem1 in arr1 for elem2 in arr2)

        results = Parallel(n_jobs=-1)(delayed(_parallel_match)(self, e1, e2) for e1, e2 in args)
        # ...

    def match(self, i1, i2):
        return i1 == i2

def _parallel_match(m, i1, i2):
    return m.match(i1, i2)

matcher = Matcher()
matcher.match_all(np.ones(250), np.ones(250))

So if I run it like shown above, it takes about 30 secs to complete and use almost 200Mb. If I just change the parameter n_jobs in Parallel and set it to 1 it only takes 1.80 secs and barely use 50Mb...

I suppose it has to be something related to the way I pass the arguments, but haven't found a better way to do it...

I'm using Python 2.7.9

690

asked Apr 26 '15 11:04

Luigolas

1 Answers

I have re-written the code without using joblib library and now it works like it is supposed to work, although not so "beautiful" code:

import itertools
import multiprocessing
import numpy as np


class Matcher(object):
    def match_all(self, a1, a2):
        args = ((elem1, elem2) for elem1 in a1 for elem2 in a2)
        args = zip(itertools.repeat(self), args)

        pool = multiprocessing.Pool()
        results = np.fromiter(pool.map(_parallel_match, args))
        # ...

    def match(self, i1, i2):
        return i1 == i2

def _parallel_match(*args):
    return args[0][0].match(*args[0][1:][0])

matcher = Matcher() 
matcher.match_all(np.ones(250), np.ones(250))

This version works like a charm, and takes only 0.58 secs to complete...

So, why isn't it working at all with joblib? Can't really get to understand it, but I guess joblib is making copies of the whole array for every single process...

196

answered Sep 22 '22 10:09

Luigolas

Related questions
                            
                                Remove lines separating cells in seaborn heatmap when saved as pdf
                            
                                django migrate has error: Specify a USING expression to perform the conversion
                            
                                Fill Between Two Polar Curves with matplotlib fill_between
                            
                                How do I get the HTML of a wiki page with Pywikibot?
                            
                                Python MySQLdb with SELinux
                            
                                Can't run example code referenced in Flask docs
                            
                                uWSGI - Different Harakiri Timeout for Django Admin
                            
                                How to load directory of JSON files into Apache Spark in Python
                            
                                Is it possible to yield two things at a time just like return?
                            
                                Python:ValueError: shapes (3,) and (118,1) not aligned: 3 (dim 0) != 118 (dim 0)
                            
                                How to force Google App Engine [python] to use SSL (https)?
                            
                                Python loading 'utf-16' file can't decode '\u0153'
                            
                                Python UnicodeDecodeError when writing German letters
                            
                                How to use unicode symbols in matplotlib?
                            
                                Bokeh - Plotting Data with Gaps
                            
                                Cannot start a console in newly installed Pycharm in Windows
                            
                                Python 2 vs 3: Lambda Operator [duplicate]
                            
                                Python's and Numpy's nan and set
                            
                                Pandas: how to convert a cell with multiple values to multiple rows?
                            
                                No module named osgeo.ogr

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python multiprocessing (joblib) best way for argument passing

Tags:

python

numpy

python-multiprocessing

joblib

Luigolas

People also ask

1 Answers

Luigolas

Recent Activity

Donate For Us