I am trying to understand how to read grid_scores_
and ranking_
values in RFECV. Here is the main example from the documentation:
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = SVR(kernel="linear")
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)
selector.support_
array([ True, True, True, True, True,
False, False, False, False, False], dtype=bool)
selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
How am I supposed to read ranking_
and grid_scores_
? Is the lower the ranking value the better? (or viceversa?). The reason why ask this is because I have noticed that the features with the highest ranking value, have typically the highest scores in grid_scores_
.
However, if something has a ranking = 1
shouldn't that mean that it was ranked as the best of the group?. This is also what the documentation says:
"Selected (i.e., estimated best) features are assigned rank 1"
But now let's look at the following example using some real data:
> rfecv.grid_scores_[np.nonzero(rfecv.ranking_ == 1)[0]]
0.0
while the feature with the highest ranking value has the highest *score*.
> rfecv.grid_scores_[np.argmax(rfecv.ranking_ )]
0.997
Note that in the example above, the features with ranking=1 have the lowest score
On this matter, in this figure in the documentation, the y
axis reads "number of misclassifications"
, but it is plotting grid_scores_
which used 'accuracy'
(?) as a scoring function. Shouldn't the y
label read accuracy
? (the higher the better) instead of "number of misclassifications"
(the lower the better)
Feature ranking with recursive feature elimination. Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features.
Recursive feature elimination (RFE) is a feature selection method that fits a model and removes the weakest feature (or features) until the specified number of features is reached.
The RFE method is available via the RFE class in scikit-learn. RFE is a transform. To use it, first the class is configured with the chosen algorithm specified via the “estimator” argument and the number of features to select via the “n_features_to_select” argument.
RFE is a transformer estimator, which means it follows the familiar fit/transform pattern of Sklearn. It is a popular algorithm due to its easy configurable nature and robust performance. As the name suggests, it removes features one at a time based on the weights given by a model of our choice in each iteration.
You are correct in that a low ranking value indicates a good feature and that a high cross-validation score in the grid_scores_
attribute is also good, however you are misinterpreting what the values in grid_scores_
mean. From the RFECV documentation
grid_scores_
array of shape [n_subsets_of_features]
The cross-validation scores such that grid_scores_[i] corresponds to the CV score of the i-th subset of features.
Thus the grid_scores_
values don't correspond to a particular feature, they are the cross-validation error metrics for subsets of features. In the example the subset with 5 features turns out to be the most informative set because the 5th value in grid_scores_
(the CV value for the SVR model incorporating the 5 most highly ranked features) is the largest.
You should also note that since the scoring metric is not explicitly specified, the scorer used is the default for SVR, which is R^2, not accuracy (which is only meaningful for classifiers).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With