Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I fix/debug this Multi-Process terminated worker error thrown in scikit learn

I recently set up a new machine to aid in decreasing run times for fitting models and data wrangling.

I did some preliminary benchmarks and everything is mostly smoothe, but I ran into a snag when I tried enabling multi-process workers with in scikit learn.

I've simplified the error to not be associated with my original code as I enabled this feature without a problem on a different machine and a VM.

I've also done memory allocation checks to make sure my machine wasn't running out of available RAM. I have 16gb of RAM so there should be no issue, but I've left the output of the test incase I missed something.

Given the traceback error near I can tell my OS is killing this, but for the life of me I can't figure out why. Near as I can tell my code will ONLY run when it is just using a single CPU core.

I'm running Windows 10, AMD ryzen 7 2700x, 16GB RAM

Code

import sklearn
import numpy as np
import tracemalloc
import time


from sklearn.model_selection import cross_val_score
from numpy.random import randn
from sklearn.linear_model import Ridge


##################### memory allocation snapshot

tracemalloc.start()

start_time = time.time()
snapshot1 = tracemalloc.take_snapshot()

###################### model

X = randn(815000, 100)
y = randn(815000, 1)
mod = Ridge()
sc = cross_val_score(mod, X, y,verbose =10, n_jobs=3)

################### Second memory allocation snapshot

snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')

print("[ Top 10 ]")
for stat in top_stats[:5]:
print(stat)

The expected results from this are pretty obvious, just a returned score with the fit model.

Error Output

[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done   3 out of   3 | elapsed:    0.2s remaining:    0.0s
---------------------------------------------------------------------------
TerminatedWorkerError                     Traceback (most recent call last)
<ipython-input-18-b2bdfd425f82> in <module>
     16 y = randn(815000, 1)
     17 mod = Ridge()
---> 18 sc = cross_val_score(mod, X, y,verbose =10, n_jobs=3)

..........

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. 
This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

Memory Output

[ Top 5 ]
<ipython-input-18-b2bdfd425f82>:15: size=622 MiB (+622 MiB), count=3 (+3), average=207 MiB
<ipython-input-18-b2bdfd425f82>:16: size=6367 KiB (+6367 KiB), count=3 (+3), average=2122 KiB
~python37\lib\inspect.py:732: size=37.2 KiB (+26.2 KiB), count=596 (+419), average=64 B
~python37\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py:292: size=7072 B (+3808 B), count=13 (+7), average=544 B
~python37\lib\pickle.py:549: size=5728 B (+3408 B), count=14 (+8), average=409 B
like image 905
ZdWhite Avatar asked Jan 11 '19 01:01

ZdWhite


1 Answers

I figured out the my scipy module was incompatible with my windows 10 C++ redistributable version.

All i did was download the latest visual studio and installed the C++ redistributable update that is listed in the "individual components" section.

Once I installed that I restarted my computer and ran.

import scipy
scipy.test()

Once that was actually running I attempted my code block above and it fixed.

I think what this boils down to is installing an old build of windows 10 with a brand new version of python and scipy

This took a LONG time to solve and debug. Hopefully it helps.

like image 161
ZdWhite Avatar answered Nov 02 '22 21:11

ZdWhite