Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The easiest way for getting feature names after running SelectKBest in Scikit Learn

I would like to make supervised learning.

Until now I know to do supervised learning to all features.

However, I would like also to conduct experiment with the K best features.

I read the documentation and found the in Scikit learn there is SelectKBest method.

Unfortunately, I am not sure how to create new dataframe after finding those best features:

Let's assume I would like to conduct experiment with 5 best features:

from sklearn.feature_selection import SelectKBest, f_classif select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class) 

Now if I would add the next line:

dataframe = pd.DataFrame(select_k_best_classifier) 

I will receive a new dataframe without feature names (only index starting from 0 to 4).

I should replace it to:

dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names) 

My question is how to create the features_names list??

I know that I should use:

 select_k_best_classifier.get_support() 

Which returns array of boolean values.

The true value in the array represent the index in the right column.

How should I use this boolean array with the array of all features names I can get via the method:

feature_names = list(features_dataframe.columns.values) 
like image 476
Aviade Avatar asked Oct 03 '16 19:10

Aviade


People also ask

How do you know which features are selected in SelectKBest?

What you are looking for is the get_support method of feature_selection. SelectKBest . It returns an array of booleans representing whether a given feature was selected ( True ) or not ( False ).

How do you select K in SelectKBest?

('fs', SelectKBest(k=0)), You can drop the fs__k line and correct the declaration line to the k you want, or set the k you want in the search_grid definition. This works.

What is Sklearn Feature_selection?

Feature selection. The classes in the sklearn. feature_selection module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators' accuracy scores or to boost their performance on very high-dimensional datasets.


2 Answers

This doesn't require loops.

# Create and fit selector selector = SelectKBest(f_classif, k=5) selector.fit(features_df, target) # Get columns to keep and create new dataframe with those only cols = selector.get_support(indices=True) features_df_new = features_df.iloc[:,cols] 
like image 186
Reimar Avatar answered Sep 19 '22 10:09

Reimar


For me this code works fine and is more 'pythonic':

mask = select_k_best_classifier.get_support() new_features = features_dataframe.columns[mask] 
like image 41
Dmitriy Apollonin Avatar answered Sep 23 '22 10:09

Dmitriy Apollonin