scikit-learn: get selected features when using SelectKBest within pipeline

Tags:

I am trying to do features selection as a part of the a scikit-learn pipeline, on a multi-label scenario. My purpose is to select best K features, for some given k.

It might be simple, but I don't understand how to get the selected features indices in such a scenario.

on a regular scenario I could do something like that:

anova_filter = SelectKBest(f_classif, k=10)

anove_filter.fit_transform(data.X, data.Y)

anova_filter.get_support()

but on a multilabel scenario my labels dimensions are #samples X #unique_labels so fit and fit_transform yield the following exception: ValueError: bad input shape

which makes sense, because it expects labels of dimension [#samples]

on the multilabel scenario, it makes sense to do something like that:

clf = Pipeline([('f_classif', SelectKBest(f_classif, k=10)),('svm', LinearSVC())])

multiclf = OneVsRestClassifier(clf, n_jobs=-1)

multiclf.fit(data.X, data.Y)

but then the object I'm getting is of type sklearn.multiclass.OneVsRestClassifier which doesn't have a get_support function. How do I get the trained SelectKBest model when it's used during a pipeline?

512

asked Sep 12 '15 20:09

Delli22

1 Answers

The way you set it up, there will be one SelectKBest per class. Is that what you intended? You can get them via

multiclf.estimators_[i].named_steps['f_classif'].get_support()

If you want one feature selection for all the OvR models, you can do

clf = Pipeline([('f_classif', SelectKBest(f_classif, k=10)),
                ('svm', OneVsRestClassifier(LinearSVC()))])

and get the single feature selection with

clf.named_steps['f_classif'].get_support()

answered Oct 18 '22 00:10

Andreas Mueller

Related questions
                            
                                interpreting Graphviz output for decision tree regression
                            
                                Upweight a Category
                            
                                Error when checking input: expected dense_input to have shape (21,) but got array with shape (1,)
                            
                                python OpenAI gym monitor creates json files in the recording directory
                            
                                AttributeError: 'Sequential' object has no attribute 'output_names'
                            
                                Use SMOTE to oversample image data
                            
                                qloguniform search space setting issue in Hyperopt
                            
                                Is deep learning bad at fitting simple non linear functions outside training scope (extrapolating)?
                            
                                How do I get a loss per epoch and not per batch?
                            
                                Using ROC AUC score with Logistic Regression and Iris Dataset
                            
                                Default Adam optimizer doesn't work in tf.keras but string `adam` does
                            
                                What are some pagerank alternatives?
                            
                                Which classification algorithm can be used for document categorization?
                            
                                Build an approximately uniform grid from random sample (python)
                            
                                Least squares linear classifier in matlab
                            
                                R: unclear behaviour of tuneRF function (randomForest package)
                            
                                Apache Spark ALS Recommendation Rating values higher than range
                            
                                Torch Resize Tensor
                            
                                Machine learning in Clojure
                            
                                Encoding String to numbers so as to use it in scikit-learn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scikit-learn: get selected features when using SelectKBest within pipeline

Tags:

machine-learning

classification

scikit-learn

feature-selection

multilabel-classification

Delli22

People also ask

1 Answers

Andreas Mueller

Recent Activity

Donate For Us