Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

find important features for classification

I'm trying to classify some EEG data using a logistic regression model (this seems to give the best classification of my data). The data I have is from a multichannel EEG setup so in essence I have a matrix of 63 x 116 x 50 (that is channels x time points x number of trials (there are two trial types of 50), I have reshaped this to a long vector, one for each trial.

What I would like to do is after the classification to see which features were the most useful in classifying the trials. How can I do that and is it possible to test the significance of these features? e.g. to say that the classification was drive mainly by N-features and these are feature x to z. So I could for instance say that channel 10 at time point 90-95 was significant or important for the classification.

So is this possible or am I asking the wrong question?

any comments or paper references are much appreciated.

like image 712
Mads Jensen Avatar asked Apr 03 '13 19:04

Mads Jensen


People also ask

What is the importance of feature selection on classification accuracy?

Three key benefits of performing feature selection on your data are: Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise. Improves Accuracy: Less misleading data means modeling accuracy improves. Reduces Training Time: Less data means that algorithms train faster.


1 Answers

Scikit-learn includes quite a few methods for feature ranking, among them:

  • Univariate feature selection (http://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection.html)
  • Recursive feature elimination (http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_digits.html)
  • Randomized Logistic Regression/stability selection (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RandomizedLogisticRegression.html)

(see more at http://scikit-learn.org/stable/modules/feature_selection.html)

Among those, I definitely recommend giving Randomized Logistic Regression a shot. In my experience, it consistently outperforms other methods and is very stable. Paper on this: http://arxiv.org/pdf/0809.2932v2.pdf

Edit: I have written a series of blog posts on different feature selection methods and their pros and cons, which are probably useful for answering this question in more detail:

  • http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/
  • http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/
  • http://blog.datadive.net/selecting-good-features-part-iii-random-forests/
  • http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/
like image 192
Ando Saabas Avatar answered Oct 01 '22 18:10

Ando Saabas