Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A task failed to un-serialize

Tags:

python

spyder

I'm trying to evaluate an ANN. I get the accuracies if I use n_jobs = 1, however, when I use n_jobs = - 1 I get the following error. BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

I have tried using other numbers but it only works if I use n_jobs = 1

This is the code I am running: accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1)

This is the error I am getting:

 Traceback (most recent call last):
 File "<ipython-input-12-cc51c2d2980a>", line 1, in <module>
 accuracies = cross_val_score(estimator = classifier, X = X_train, 
 y = y_train, cv = 10, n_jobs = -1)

 File "C:\Users\javie\Anaconda3\lib\site- 
 packages\sklearn\model_selection\_validation.py", line 402, in 
 cross_val_score
 error_score=error_score)

 File "C:\Users\javie\Anaconda3\lib\site- 
 packages\sklearn\model_selection\_validation.py", line 240, in 
 cross_validate
 for train, test in cv.split(X, y, groups))

 File "C:\Users\javie\Anaconda3\lib\site- 
 packages\sklearn\externals\joblib\parallel.py", line 930, in __call__
 self.retrieve()

 File "C:\Users\javie\Anaconda3\lib\site- 
 packages\sklearn\externals\joblib\parallel.py", line 833, in retrieve
 self._output.extend(job.get(timeout=self.timeout))

 File "C:\Users\javie\Anaconda3\lib\site- 
 packages\sklearn\externals\joblib\_parallel_backends.py", line 521, in 
 wrap_future_result
 return future.result(timeout=timeout)

 File "C:\Users\javie\Anaconda3\lib\concurrent\futures\_base.py", line 
 432, in result
 return self.__get_result()

 File "C:\Users\javie\Anaconda3\lib\concurrent\futures\_base.py", line 
 384, in __get_result
 raise self._exception

 BrokenProcessPool: A task has failed to un-serialize. Please ensure that 
 the arguments of the function are all picklable.`

Spyder should have analyzed each batch in parallel, but even when I use n_jobs = 1 it only analyzes 10 epochs.

like image 365
Javier Perez Avatar asked May 15 '19 17:05

Javier Perez


3 Answers

This always happens when using multiprocessing in an iPython console in Spyder. A workaround is to run the script from the command line instead.

like image 137
Forzaa Avatar answered Oct 30 '22 11:10

Forzaa


Just posting this for others, in case it's helpful. I Ran into the same issue today running a GridSearchCV on a Dask array / cluster. Sklearn v.0.24

Solved it by using the joblib context manager as described here: https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism

like image 43
rrpelgrim Avatar answered Oct 30 '22 11:10

rrpelgrim


I get this error as well, but only on Windows. I am using joblib to run a function (call it func_x) in parallel. That function is imported from a module, let's call it module_a.

module_a also uses a function (call it func_y) from another module, module_b, which it imports using the syntax import module_b.

I found that I can avoid the BrokenProcessPool error if I edit module_a and change the import line to from module_b import func_y.

I also had to remove the if __name__ == '__main__:' from the main script which was importing module_a.

I think this subtle difference in how modules are imported to the namespace determines whether that module can then be pickled by joblib for parallel processing in Windows.

I hope this helps!

--

A minimal reproducible example is below:

Original main.py

from joblib import Parallel, delayed
import module_a

if __name__ == '__main__':
    Parallel(n_jobs=4, verbose=3)(delayed(module_a.func_x)(i) for i in range(50))

Original module_a.py (fails on Windows with BrokenProcessPool error; kernel restart required)

import module_b

def func_x(i):
    j = i ** 3
    k = module_b.func_y(j)
    return k

Edited main.py

from joblib import Parallel, delayed
import module_a

Parallel(n_jobs=4, verbose=3)(delayed(module_a.func_x)(i) for i in range(50))

Edited module_a.py (succeeds on Windows)

from module_b import func_y # changed

def func_x(i):
    j = i ** 3
    k = func_y(j) # changed
    return k

module_b.py

def func_y(m):
    k = j ** 3
    return k
like image 20
quokka Avatar answered Oct 30 '22 11:10

quokka