Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Feature selection using scikit-learn

I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method:

SelectKBest(chi2, k=10).fit_transform(A1, A2) 

Since my dataset consist of negative values, I get the following error:

ValueError                                Traceback (most recent call last)  /media/5804B87404B856AA/TFM_UC3M/test2_v.py in <module>() ----> 1        2        3        4        5   /usr/local/lib/python2.6/dist-packages/sklearn/base.pyc in fit_transform(self, X, y,     **fit_params)     427         else:     428             # fit method of arity 2 (supervised transformation)  --> 429             return self.fit(X, y, **fit_params).transform(X)     430      431   /usr/local/lib/python2.6/dist-packages/sklearn/feature_selection/univariate_selection.pyc in fit(self, X, y)     300         self._check_params(X, y)     301  --> 302         self.scores_, self.pvalues_ = self.score_func(X, y)     303         self.scores_ = np.asarray(self.scores_)     304         self.pvalues_ = np.asarray(self.pvalues_)  /usr/local/lib/python2.6/dist-  packages/sklearn/feature_selection/univariate_selection.pyc in chi2(X, y)     190     X = atleast2d_or_csr(X)     191     if np.any((X.data if issparse(X) else X) < 0): --> 192         raise ValueError("Input X must be non-negative.")     193      194     Y = LabelBinarizer().fit_transform(y)  ValueError: Input X must be non-negative. 

Can someone tell me how can I transform my data ?

like image 334
sara Avatar asked Sep 11 '14 15:09

sara


People also ask

Which method can be used for feature selection?

There are two main types of feature selection techniques: supervised and unsupervised, and supervised methods may be divided into wrapper, filter and intrinsic.

What is the feature for Scikit-learn?

Through scikit-learn, we can implement various machine learning models for regression, classification, clustering, and statistical tools for analyzing these models. It also provides functionality for dimensionality reduction, feature selection, feature extraction, ensemble techniques, and inbuilt datasets.


1 Answers

The error message Input X must be non-negative says it all: Pearson's chi square test (goodness of fit) does not apply to negative values. It's logical because the chi square test assumes frequencies distribution and a frequency can't be a negative number. Consequently, sklearn.feature_selection.chi2 asserts the input is non-negative.

You are saying that your features are "min, max, mean, median and FFT of accelerometer signal". In many cases, it may be quite safe to simply shift each feature to make it all positive, or even normalize to [0, 1] interval as suggested by EdChum.

If data transformation is for some reason not possible (e.g. a negative value is an important factor), you should pick another statistic to score your features:

  • sklearn.feature_selection.f_classif computes ANOVA f-value
  • sklearn.feature_selection.mutual_info_classif computes the mutual information

Since the whole point of this procedure is to prepare the features for another method, it's not a big deal to pick anyone, the end result usually the same or very close.

like image 90
Maxim Avatar answered Sep 20 '22 06:09

Maxim