I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method:
SelectKBest(chi2, k=10).fit_transform(A1, A2)
Since my dataset consist of negative values, I get the following error:
ValueError Traceback (most recent call last) /media/5804B87404B856AA/TFM_UC3M/test2_v.py in <module>() ----> 1 2 3 4 5 /usr/local/lib/python2.6/dist-packages/sklearn/base.pyc in fit_transform(self, X, y, **fit_params) 427 else: 428 # fit method of arity 2 (supervised transformation) --> 429 return self.fit(X, y, **fit_params).transform(X) 430 431 /usr/local/lib/python2.6/dist-packages/sklearn/feature_selection/univariate_selection.pyc in fit(self, X, y) 300 self._check_params(X, y) 301 --> 302 self.scores_, self.pvalues_ = self.score_func(X, y) 303 self.scores_ = np.asarray(self.scores_) 304 self.pvalues_ = np.asarray(self.pvalues_) /usr/local/lib/python2.6/dist- packages/sklearn/feature_selection/univariate_selection.pyc in chi2(X, y) 190 X = atleast2d_or_csr(X) 191 if np.any((X.data if issparse(X) else X) < 0): --> 192 raise ValueError("Input X must be non-negative.") 193 194 Y = LabelBinarizer().fit_transform(y) ValueError: Input X must be non-negative.
Can someone tell me how can I transform my data ?
There are two main types of feature selection techniques: supervised and unsupervised, and supervised methods may be divided into wrapper, filter and intrinsic.
Through scikit-learn, we can implement various machine learning models for regression, classification, clustering, and statistical tools for analyzing these models. It also provides functionality for dimensionality reduction, feature selection, feature extraction, ensemble techniques, and inbuilt datasets.
The error message Input X must be non-negative
says it all: Pearson's chi square test (goodness of fit) does not apply to negative values. It's logical because the chi square test assumes frequencies distribution and a frequency can't be a negative number. Consequently, sklearn.feature_selection.chi2
asserts the input is non-negative.
You are saying that your features are "min, max, mean, median and FFT of accelerometer signal". In many cases, it may be quite safe to simply shift each feature to make it all positive, or even normalize to [0, 1]
interval as suggested by EdChum.
If data transformation is for some reason not possible (e.g. a negative value is an important factor), you should pick another statistic to score your features:
sklearn.feature_selection.f_classif
computes ANOVA f-valuesklearn.feature_selection.mutual_info_classif
computes the mutual informationSince the whole point of this procedure is to prepare the features for another method, it's not a big deal to pick anyone, the end result usually the same or very close.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With