Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ranking and scores in Recursive Feature Elimination (RFE) in scikit-learn

I am trying to understand how to read grid_scores_ and ranking_ values in RFECV. Here is the main example from the documentation:

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = SVR(kernel="linear")
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)
selector.support_ 
array([ True,  True,  True,  True,  True,
        False, False, False, False, False], dtype=bool)

selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

How am I supposed to read ranking_ and grid_scores_? Is the lower the ranking value the better? (or viceversa?). The reason why ask this is because I have noticed that the features with the highest ranking value, have typically the highest scores in grid_scores_.

However, if something has a ranking = 1 shouldn't that mean that it was ranked as the best of the group?. This is also what the documentation says:

"Selected (i.e., estimated best) features are assigned rank 1"

But now let's look at the following example using some real data:

> rfecv.grid_scores_[np.nonzero(rfecv.ranking_ == 1)[0]]
0.0

while the feature with the highest ranking value has the highest *score*.

> rfecv.grid_scores_[np.argmax(rfecv.ranking_ )]
0.997

Note that in the example above, the features with ranking=1 have the lowest score

Figure in the documentation:

On this matter, in this figure in the documentation, the y axis reads "number of misclassifications", but it is plotting grid_scores_ which used 'accuracy' (?) as a scoring function. Shouldn't the y label read accuracy? (the higher the better) instead of "number of misclassifications" (the lower the better)

like image 272
Amelio Vazquez-Reina Avatar asked Aug 14 '13 23:08

Amelio Vazquez-Reina


People also ask

What does RFE ranking mean?

Feature ranking with recursive feature elimination. Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features.

What is RFE recursive feature elimination?

Recursive feature elimination (RFE) is a feature selection method that fits a model and removes the weakest feature (or features) until the specified number of features is reached.

How do you select the number of features in RFE?

The RFE method is available via the RFE class in scikit-learn. RFE is a transform. To use it, first the class is configured with the chosen algorithm specified via the “estimator” argument and the number of features to select via the “n_features_to_select” argument.

How does Sklearn RFE work?

RFE is a transformer estimator, which means it follows the familiar fit/transform pattern of Sklearn. It is a popular algorithm due to its easy configurable nature and robust performance. As the name suggests, it removes features one at a time based on the weights given by a model of our choice in each iteration.


1 Answers

You are correct in that a low ranking value indicates a good feature and that a high cross-validation score in the grid_scores_ attribute is also good, however you are misinterpreting what the values in grid_scores_ mean. From the RFECV documentation

grid_scores_

array of shape [n_subsets_of_features]

The cross-validation scores such that grid_scores_[i] corresponds to the CV score of the i-th subset of features.

Thus the grid_scores_ values don't correspond to a particular feature, they are the cross-validation error metrics for subsets of features. In the example the subset with 5 features turns out to be the most informative set because the 5th value in grid_scores_ (the CV value for the SVR model incorporating the 5 most highly ranked features) is the largest.

You should also note that since the scoring metric is not explicitly specified, the scorer used is the default for SVR, which is R^2, not accuracy (which is only meaningful for classifiers).

like image 171
DavidS Avatar answered Nov 14 '22 21:11

DavidS