find important features for classification

Tags:

I'm trying to classify some EEG data using a logistic regression model (this seems to give the best classification of my data). The data I have is from a multichannel EEG setup so in essence I have a matrix of 63 x 116 x 50 (that is channels x time points x number of trials (there are two trial types of 50), I have reshaped this to a long vector, one for each trial.

What I would like to do is after the classification to see which features were the most useful in classifying the trials. How can I do that and is it possible to test the significance of these features? e.g. to say that the classification was drive mainly by N-features and these are feature x to z. So I could for instance say that channel 10 at time point 90-95 was significant or important for the classification.

So is this possible or am I asking the wrong question?

any comments or paper references are much appreciated.

712

asked Apr 03 '13 19:04

Mads Jensen

1 Answers

Scikit-learn includes quite a few methods for feature ranking, among them:

Univariate feature selection (http://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection.html)
Recursive feature elimination (http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_digits.html)
Randomized Logistic Regression/stability selection (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RandomizedLogisticRegression.html)

(see more at http://scikit-learn.org/stable/modules/feature_selection.html)

Among those, I definitely recommend giving Randomized Logistic Regression a shot. In my experience, it consistently outperforms other methods and is very stable. Paper on this: http://arxiv.org/pdf/0809.2932v2.pdf

Edit: I have written a series of blog posts on different feature selection methods and their pros and cons, which are probably useful for answering this question in more detail:

http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/
http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/
http://blog.datadive.net/selecting-good-features-part-iii-random-forests/
http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/

192

answered Oct 01 '22 18:10

Ando Saabas

Related questions
                            
                                How to decide the size of layers in Keras' Dense method?
                            
                                How to appropriately plot the losses values acquired by (loss_curve_) from MLPClassifier
                            
                                Sklearn TFIDF vectorizer to run as parallel jobs
                            
                                Appropriate Deep Learning Structure for multi-class classification
                            
                                Getting deprecation warning in Sklearn over 1d array, despite not having a 1D array
                            
                                How to use silhouette score in k-means clustering from sklearn library?
                            
                                What is the theorical foundation for scikit-learn dummy classifier?
                            
                                Python : How to use Multinomial Logistic Regression using SKlearn
                            
                                Python - linear regression TypeError: invalid type promotion
                            
                                F1-score per class for multi-class classification
                            
                                Is it possible to print the decision tree in scikit-learn?
                            
                                Skip forbidden parameter combinations when using GridSearchCV
                            
                                Object of type 'ndarray' is not JSON serializable
                            
                                sklearn.compose.ColumnTransformer: fit_transform() takes 2 positional arguments but 3 were given
                            
                                How can I pass a preprocessor to TfidfVectorizer? - sklearn - python
                            
                                Scikit-learn GridSearch giving "ValueError: multiclass format is not supported" error
                            
                                Insert result of sklearn CountVectorizer in a pandas dataframe
                            
                                KerasRegressor Coefficient of Determination R^2 Score
                            
                                What type is a sklearn model?
                            
                                Keras - How to perform a prediction using KerasRegressor?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

find important features for classification

Tags:

scikit-learn

feature-selection

Mads Jensen

People also ask

1 Answers

Ando Saabas

Recent Activity

Donate For Us