I am using sklearn to carry out recursive feature elimination with cross-validation, using the RFECV module. RFE involves repeatedly training an estimator on the full set of features, then removing the least informative features, until converging on the optimal number of features.
In order to obtain optimal performance by the estimator, I want to select the best hyperparameters for the estimator for each number of features(edited for clarity). The estimator is a linear SVM so I am only looking into the C parameter.
Initially, my code was as follows. However, this just did one grid search for C at the beginning, and then used the same C for each iteration.
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV
from sklearn import svm, grid_search
def get_best_feats(data,labels,c_values):
parameters = {'C':c_values}
# svm1 passed to clf which is used to grid search the best parameters
svm1 = SVC(kernel='linear')
clf = grid_search.GridSearchCV(svm1, parameters, refit=True)
clf.fit(data,labels)
#print 'best gamma',clf.best_params_['gamma']
# svm2 uses the optimal hyperparameters from svm1
svm2 = svm.SVC(C=clf.best_params_['C'], kernel='linear')
#svm2 is then passed to RFECVv as the estimator for recursive feature elimination
rfecv = RFECV(estimator=svm2, step=1, cv=StratifiedKFold(labels, 5))
rfecv.fit(data,labels)
print "support:",rfecv.support_
return data[:,rfecv.support_]
The documentation for RFECV gives the parameter "estimator_params : Parameters for the external estimator. Useful for doing grid searches when an RFE object is passed as an argument to, e.g., a sklearn.grid_search.GridSearchCV object."
Therefore I want to try to pass my object 'rfecv' to the grid search object, as follows:
def get_best_feats2(data,labels,c_values):
parameters = {'C':c_values
svm1 = SVC(kernel='linear')
rfecv = RFECV(estimator=svm1, step=1, cv=StratifiedKFold(labels, 5), estimator_params=parameters)
rfecv.fit(data, labels)
print "Kept {} out of {} features".format((data[:,rfecv.support_]).shape[1], data.shape[1])
print "support:",rfecv.support_
return data[:,rfecv.support_]
X,y = get_heart_data()
c_values = [0.1,1.,10.]
get_best_feats2(X,y,c_values)
But this returns the error:
max_iter=self.max_iter, random_seed=random_seed)
File "libsvm.pyx", line 59, in sklearn.svm.libsvm.fit (sklearn/svm /libsvm.c:1674)
TypeError: a float is required
So my question is: how can I pass the rfe object to the grid search in order to do cross-validation for each iteration of recursive feature elimination?
Thanks
Recursive feature elimination (RFE) is a feature selection method that fits a model and removes the weakest feature (or features) until the specified number of features is reached.
This highlights that the k-fold cross-validation procedure is used both in the selection of model hyperparameters to configure each model and in the selection of configured models.
Recursive Feature Elimination, or RFE for short, is a popular feature selection algorithm. RFE is popular because it is easy to configure and use and because it is effective at selecting those features (columns) in a training dataset that are more or most relevant in predicting the target variable.
RFE is also a type of backward selection method however RFE works on feature ranking system. First model is fit on linear regression based on all variables. Then it calculates variable coefficients and their importance.
So you want to grid-search the C in the SVM for each number of features in the RFE? Or for each CV iteration in the RFECV? From your last sentence, I guess it is the former.
You can do RFE(GridSearchCV(SVC(), param_grid))
to achieve that,
though I'm not sure that is actually a helpful thing to do.
I don't think the second is possible right now (but soon). You could do GridSeachCV(RFECV(), param_grid={'estimator__C': Cs_to_try})
, but that nests two sets of cross-validation inside each other.
Update:
GridSearchCV has no coef_
, so the first one fails.
A simple fix:
class GridSeachWithCoef(GridSearchCV):
@property
def coef_(self):
return self.best_estimator_.coef_
And then use that instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With