Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error while using joblib with imported function

I'm using joblib to parallelize my python 3.5 code.
If I do:

from modules import f
from joblib  import Parallel, delayed

if __name__ == '__main__':
    Parallel( n_jobs  =  n_jobs,backend = "multiprocessing")(delayed(f)(i) for i in range( 10 ))

code doesn't work. Instead:

from joblib import Parallel, delayed

def f( i ):
    # my func ...

if __name__ == '__main__':
    Parallel( n_jobs  =  n_jobs, backend = "multiprocessing")(delayed(f)(i) for i in range(10))

This works!

Can someone explain why I have to put all my functions in the same script?

That is really unpractical, because in modules there are plenty of functions that I coded, that I don't want to copy / paste in the main script.

like image 688
Grg Avatar asked Oct 19 '17 10:10

Grg


People also ask

What is joblib used for in Python?

Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.

Is joblib included in Python?

Dependencies. Joblib has no mandatory dependencies besides Python (supported versions are 3.6+). Joblib has an optional dependency on Numpy (at least version 1.6. 1) for array manipulation.

What does joblib delayed do?

The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax. Under Windows, the use of multiprocessing. Pool requires to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.

What is the purpose of using joblib in Jupyter toolkit?

Joblib provides a better way to avoid recomputing the same function repetitively saving a lot of time and computational cost. For example, let's take a simple example below: As seen above, the function is simply computing the square of a number over a range provided.


2 Answers

I faced the similar ussue. When I call function from import, it just freezes and when I call local function it works OK. Solve it by using multithreading instead of multiprocessing like that

Parallel( n_jobs  =  n_jobs, backend='threading')(delayed(f)(i) for i in range(10))
like image 118
user2633719 Avatar answered Sep 23 '22 18:09

user2633719


I found a workaround that allows you to keep the helper functions in separates module. For each imported function that you want to parallelize, define a proxy function in your main module, e.g. as

def f_proxy(*args, **kwargs):
    return f(*args, **kwargs)

and simply use delayed(f_proxy). It is still somewhat unsatisfactory, but cleaner than moving all helper functions into the main module.

like image 30
Ben JW Avatar answered Sep 22 '22 18:09

Ben JW