Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RFECV with parallel jobs

I would like to perform Recursive Feature Elimination with the included cross validation (RFECV). My problem is that although I have heavily subsampled my data, with my number of features (278) the process is way too slow and would probably not be concluded in the time that I have allocated for my experiment.

I have seen that typical cross-validation in scikit-learn supports parallelization, by defining the number of jobs that can be run in parallel. Is it possible that the tasks from RFECV to be parallelized ?

like image 789
lefterav Avatar asked Oct 20 '25 18:10

lefterav


1 Answers

The changelog for the version 0.18 release shows that RFECV now supports n_jobs.

Following the example in the RFECV documentation (I changed n_samples from 50 to 5000)

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR

X, y = make_friedman1(n_samples=5000, n_features=5, random_state=0)
estimator = SVR(kernel="linear")

1 job : 22.5s

%%time
selector = RFECV(estimator, step=1, cv=5, n_jobs=1)
selector = selector.fit(X, y)

CPU times: user 23.1 s, sys: 2.71 s, total: 25.8 s
Wall time: 22.5 s

4 jobs : 11.8s

%%time
selector = RFECV(estimator, step=1, cv=5, n_jobs=4)
selector = selector.fit(X, y)

CPU times: user 3.42 s, sys: 312 ms, total: 3.74 s
Wall time: 11.8 s
like image 131
Kevin Avatar answered Oct 26 '25 06:10

Kevin