Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python scikit learn n_jobs

This is not a real issue, but I'd like to understand:

  • running sklearn from Anaconda distrib on a Win7 4 cores 8 GB system
  • fitting a KMeans model on a 200.000 samples*200 values table.
  • running with n-jobs = -1: (after adding the if __name__ == '__main__': line to my script) I see the script starting 4 processes with 10 threads each. Each process uses about 25% of the CPU (total: 100%). Seems to work as expected
  • running with n-jobs = 1: stays on a single process (not a surprise), with 20 threads, and also uses 100% of the CPU.

My question: what is the point of using n-jobs (and joblib) if the the library uses all cores anyway? Am I missing something? Is it a Windows-specific behaviour?

like image 595
Bruno Hanzen Avatar asked Sep 24 '15 12:09

Bruno Hanzen


People also ask

What is N_jobs in Sklearn?

n_jobs is an integer, specifying the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. If set to -1, all CPUs are used.

What is N_jobs in machine learning?

n_jobs: Specify the number of cores to use for key machine learning tasks.

Is Sklearn multithreaded?

Scikit-learn relies heavily on NumPy and SciPy, which internally call multi-threaded linear algebra routines implemented in libraries such as MKL, OpenBLAS or BLIS.

Does Sklearn use GPU?

By default it does not use GPU, especially if it is running inside Docker, unless you use nvidia-docker and an image with a built-in support. Scikit-learn is not intended to be used as a deep-learning framework and it does not provide any GPU support.


1 Answers

  • what is the point of using n-jobs (and joblib) if the the library uses all cores anyway?

It does not, if you specify n_jobs to -1, it will use all cores. If it is set to 1 or 2, it will use one or two cores only (test done scikit-learn 0.20.3 under Linux).

like image 152
Sim Avatar answered Sep 23 '22 20:09

Sim