I would like to perform Recursive Feature Elimination with the included cross validation (RFECV). My problem is that although I have heavily subsampled my data, with my number of features (278) the process is way too slow and would probably not be concluded in the time that I have allocated for my experiment.
I have seen that typical cross-validation in scikit-learn supports parallelization, by defining the number of jobs that can be run in parallel. Is it possible that the tasks from RFECV to be parallelized ?
The changelog for the version 0.18 release shows that RFECV now supports n_jobs.
Following the example in the RFECV documentation (I changed n_samples from 50 to 5000)
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=5000, n_features=5, random_state=0)
estimator = SVR(kernel="linear")
%%time
selector = RFECV(estimator, step=1, cv=5, n_jobs=1)
selector = selector.fit(X, y)
CPU times: user 23.1 s, sys: 2.71 s, total: 25.8 s
Wall time: 22.5 s
%%time
selector = RFECV(estimator, step=1, cv=5, n_jobs=4)
selector = selector.fit(X, y)
CPU times: user 3.42 s, sys: 312 ms, total: 3.74 s
Wall time: 11.8 s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With