For some reason the code below is useing all available cores even though I have set n_jobs equal to 1. Have I missed something or should I submit an issue at scikit ?
import numpy as np
from sklearn import linear_model
liReg = linear_model.LinearRegression(n_jobs=1)
a = np.random.rand(10000,20)
b = np.random.rand(10000)
for i in range(1000):
liReg.fit(a, b)
liReg.predict(a)
I have two identical servers but one runs scikit v0.18 and one v0.17 - this only happens when using 0.18.
Here is the output of time python example.py:
Using 0.17 - just uses one core:
real 0m8.381s
user 0m6.387s
sys 0m1.677s
Using 0.18 - uses all cores:
real 0m32.308s # I guess longer due to overhead of parallel process management
user 2m53.612s
sys 20m48.285s
From @GaelVaroquaux on Github: https://github.com/scikit-learn/scikit-learn/issues/8883#issuecomment-301567818
Most likely you are using a parallel-enabled linear algebra library (like MKL or openBLAS). Hence, it is not scikit-learn that is doing parallel computing, and it cannot control it (it is a component that is used inside scikit-learn). You need to find out how to control the corresponding computing brick.
In my case I was using OpenBLAS on fedora linux so I simply added:
export OPENBLAS_NUM_THREADS=1 to my .bashrc to disable multithreading within the linear algebra call.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With