I'm wondering, which is better to use with <code>GridSearchCV( ..., n_jobs = ... )</code> to pick the best parameter set for a model, <code>n_jobs = -1</code> or <code>n_jobs</code> with a big number, like <code>n_jobs = 30</code> ? Based on Sklearn documentation: <blockquote> <code>n_jobs = -1</code> means that the computation will be dispatched on all the CPUs of the computer. </blockquote> On my PC I have an Intel i3 CPU, which has 2 cores and 4 threads, so does that mean if I set <code>n_jobs = -1</code>, implicitly it will be equal to <code>n_jobs = 2</code> ?

An additional simpler answer by Prof. Kevyn Collins-Thompson, from course Applied Machine Learning in Python: If I have 4 cores in my system, <code>n_jobs = 30</code> (30 as an example) will be the same as <code>n_jobs = 4</code>. So no additional effect <blockquote> So the maximum performance that can be obtained always is using <code>n_jobs = -1</code> </blockquote>

How to find an optimum number of processes in GridSearchCV( ..., n_jobs = ... )?

Tags:

python

parallel-processing

machine-learning

scikit-learn

parallelism-amdahl

I'm wondering, which is better to use with GridSearchCV( ..., n_jobs = ... ) to pick the best parameter set for a model, n_jobs = -1 or n_jobs with a big number,
like n_jobs = 30 ?

Based on Sklearn documentation:

n_jobs = -1 means that the computation will be dispatched on all the CPUs of the computer.

On my PC I have an Intel i3 CPU, which has 2 cores and 4 threads, so does that mean if I set n_jobs = -1, implicitly it will be equal to n_jobs = 2 ?

998

asked May 04 '18 21:05

Minions

2 Answers

... does that mean if I set n_jobs = -1, implicitly it will be equal to n_jobs = 2 ?

This one is easy :

python ( scipy / joblib inside a GridSearchCV() ) used to detect the number of CPU-cores, that is reasonable to schedule concurrent ( independent ) processes, given a request was done with an n_jobs = -1 setting.

enter image description here Funny to see a 3-CPU-core?

In some virtualised-machine cases, that can synthetically emulate CPU / cores, the results are not as trivial as in your known Intel CPU / i3 case.

If in doubts, one can test this with a trivialised case ( on an indeed small data-set, not the full-blown model-space search ... ) and let the story go on to prove this.

import psutil;                  print( "{0:17s}{1:} CPUs PHYSICAL".format(
      "psutil:",
       psutil.cpu_count( logical = False ) ) )
pass;                           print( "{0:17s}{1:} CPUs LOGICAL".format(
      "psutil:",
       psutil.cpu_count( logical = True  ) ) )
...

A similar host-platform "self-detection" may report more details for different systems / settings:

'''
sys:             linux 
                 3.6.1 (default, Jun 27 2017, 14:35:15)  .. [GCC 7.1.1 20170622 (Red Hat 7.1.1-3)]

multiprocessing: 1 CPU(s)
psutil:          1 CPUs PHYSICAL
psutil:          1 CPUs LOGICAL
psutil:          psutil.cpu_freq(  per_cpu = True  ) not able to report. ?( v5.1.0+ )
psutil:          5.0.1
psutil:          psutil.cpu_times( per_cpu = True  ) not able to report. ?( vX.Y.Z+ )
psutil:          5.0.1
psutil:          svmem(total=1039192064, available=257290240, percent=75.2, used=641396736, free=190361600, active=581107712, inactive=140537856, buffers=12210176, cached=195223552, shared=32768)
numexpr:         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ModuleNotFoundError: No module named 'numexpr'.
joblib:          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ModuleNotFoundError: No module named 'joblib'.
sklearn/joblib:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ModuleNotFoundError: No module named 'sklearn.externals.joblib' 
'''

''' [i5]
>>> numexpr.print_versions()
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
'''

... which is better to use with GridSearchCV to pick the best parameter set for a model,
n_jobs = -1 or n_jobs with a big number like n_jobs = 30 ?

There is no easy "One-Size-Fits-All" answer on this :

The Scikit tools ( and many other followed this practice ) used to spawn, on n_jobs directive being used, a required amount of concurrent process-instances ( so as to escape from shared GIL-lock stepping - read more on this elsewhere if interested in details ).

This process-instantiation is not cost-free ( both time-wise, i.e. spending a respectfull amount of the [TIME]-domain costs, but also space-wise, i.e. spending at least an n_jobs-times the RAM-allocations of the single python process-instance in [SPACE]-domain ).

Given this, your fight is a battle against a dual-edged sword.

An attempt to "underbook" CPU will let ( some ) CPU-cores possibly idling.
An attempt to "overbook" RAM-space will turn your performance worse than expected, as virtual-memory will turn operating system swapping, which turns your Machine Learning-scaled data-access times from ~ 10+[ns] more than 100,000 x slower ~ 10+ [ms] which is hardly what one will be pleased at.

The overall effects of n_jobs = a_reasonable_amount_of_processes is subject of Amdahl's Law ( the re-formulated one, not an add-on overhead-naive version ), so there will be a practical optimality peak ( a maximum ) of how many CPU-cores will help to improve one's processing intentions, beyond of which the overhead-costs ( sketched for both the [TIME]- and [SPACE]-domains above ) will actually deteriorate any potential positive impact expectations.

Having used RandomForestRegressor() on indeed large data-sets in production, I can tell you the [SPACE]-domain is your worse of the enemies in trying to grow n_jobs any farther and none system-level tuning will ever overcome this boundary ( so more and more ultra-low latency RAM and more and more ( real ) CPU-cores is the only practical recipe for going into indeed any larger n_jobs computing plans ).

113

answered Nov 14 '22 22:11

user3666197

An additional simpler answer by Prof. Kevyn Collins-Thompson, from course Applied Machine Learning in Python:

If I have 4 cores in my system, n_jobs = 30 (30 as an example) will be the same as n_jobs = 4. So no additional effect

So the maximum performance that can be obtained always is using n_jobs = -1

answered Nov 14 '22 23:11

Minions

Related questions
                            
                                Python OpenCV - remove title bar, toolbar, and status bar
                            
                                Simple way for using multiple Numpy Arrays as an input for one Seaborn boxplot
                            
                                BeautifulSoup - How to find a specific class name alone
                            
                                Is it possible to have a QListWidget select multiple setCurrentItems
                            
                                How to turn pip / pypi installed python packages into zip files to be used in AWS Glue
                            
                                import scipy error: cannot import name '_ccallback_c'
                            
                                Group list of dictionaries by value [duplicate]
                            
                                ValueError: `decode_predictions` expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 7)
                            
                                Python Matplotlib - Plotting cuboids
                            
                                Using sklearn StandardScaler on only select columns
                            
                                PEP 3106 suggests slower way? Why?
                            
                                Parsing elements from list of list of strings
                            
                                Find period of a signal out of the FFT
                            
                                What is the recommended way to serialize a collection of spaCy Docs?
                            
                                python 'module' object is not callable when calling a function
                            
                                get-pip.py broken on Windows 10
                            
                                OpenCV Masking Image - error: (-215) (mtype == 0 || mtype == 1) && _mask.sameSize(*psrc1) in function cv::binary_op
                            
                                Add labels to Seaborn bivariate KDE plot
                            
                                Anaphora resolution in stanford-nlp using python
                            
                                How to initialize variables defined in tensorflow function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find an optimum number of processes in GridSearchCV( ..., n_jobs = ... )?

Tags:

python

parallel-processing

machine-learning

scikit-learn

parallelism-amdahl

Minions

People also ask

2 Answers

This one is easy :

There is no easy "One-Size-Fits-All" answer on this :

user3666197

Minions

Recent Activity

Donate For Us