I would like to make supervised learning. Until now I know to do supervised learning to all features. However, I would like also to conduct experiment with the K best features. I read the documentation and found the in Scikit learn there is SelectKBest method. Unfortunately, I am not sure how to create new dataframe after finding those best features: Let's assume I would like to conduct experiment with 5 best features: <pre class="prettyprint"><code>from sklearn.feature_selection import SelectKBest, f_classif select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class) </code></pre> Now if I would add the next line: <pre class="prettyprint"><code>dataframe = pd.DataFrame(select_k_best_classifier) </code></pre> I will receive a new dataframe without feature names (only index starting from 0 to 4). I should replace it to: <pre class="prettyprint"><code>dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names) </code></pre> My question is how to create the features_names list?? I know that I should use: <pre class="prettyprint"><code> select_k_best_classifier.get_support() </code></pre> Which returns array of boolean values. The true value in the array represent the index in the right column. How should I use this boolean array with the array of all features names I can get via the method: <pre class="prettyprint"><code>feature_names = list(features_dataframe.columns.values) </code></pre>

For me this code works fine and is more 'pythonic': <pre class="prettyprint"><code>mask = select_k_best_classifier.get_support() new_features = features_dataframe.columns[mask] </code></pre>

The easiest way for getting feature names after running SelectKBest in Scikit Learn

Tags:

python

pandas

scikit-learn

feature-selection

I would like to make supervised learning.

Until now I know to do supervised learning to all features.

However, I would like also to conduct experiment with the K best features.

I read the documentation and found the in Scikit learn there is SelectKBest method.

Unfortunately, I am not sure how to create new dataframe after finding those best features:

Let's assume I would like to conduct experiment with 5 best features:

from sklearn.feature_selection import SelectKBest, f_classif select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class)

Now if I would add the next line:

dataframe = pd.DataFrame(select_k_best_classifier)

I will receive a new dataframe without feature names (only index starting from 0 to 4).

I should replace it to:

dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names)

My question is how to create the features_names list??

I know that I should use:

 select_k_best_classifier.get_support()

Which returns array of boolean values.

The true value in the array represent the index in the right column.

How should I use this boolean array with the array of all features names I can get via the method:

feature_names = list(features_dataframe.columns.values)

476

asked Oct 03 '16 19:10

Aviade

2 Answers

This doesn't require loops.

# Create and fit selector selector = SelectKBest(f_classif, k=5) selector.fit(features_df, target) # Get columns to keep and create new dataframe with those only cols = selector.get_support(indices=True) features_df_new = features_df.iloc[:,cols]

186

answered Sep 19 '22 10:09

Reimar

For me this code works fine and is more 'pythonic':

mask = select_k_best_classifier.get_support() new_features = features_dataframe.columns[mask]

answered Sep 23 '22 10:09

Dmitriy Apollonin

Related questions
                            
                                How to set attributes using property decorators?
                            
                                Why doesn't Python give any error when quotes around a string do not match?
                            
                                Failed to activate virtualenv with pyenv
                            
                                Passing table name as a parameter in psycopg2
                            
                                Alembic: IntegrityError: "column contains null values" when adding non-nullable column
                            
                                Serving dynamically generated ZIP archives in Django
                            
                                How to extract an arbitrary line of values from a numpy array?
                            
                                How do I use pytest with virtualenv?
                            
                                django.core.exceptions.ImproperlyConfigured: Error loading psycopg module: No module named psycopg
                            
                                ImportError: No module named virtualenv
                            
                                Python / Pandas - GUI for viewing a DataFrame or Matrix [closed]
                            
                                How is it that json serialization is so much faster than yaml serialization in Python?
                            
                                Python: filtering lists by indices
                            
                                How to parametrize a Pytest fixture
                            
                                Is there a way to prevent a SystemExit exception raised from sys.exit() from being caught?
                            
                                Calculating Time Difference
                            
                                Python re.sub with a flag does not replace all occurrences
                            
                                ValueError: zero length field name in format python [duplicate]
                            
                                Keras model.summary() object to string
                            
                                Integer square root in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With