Feature selection using scikit-learn

Tags:

I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method:

SelectKBest(chi2, k=10).fit_transform(A1, A2)

Since my dataset consist of negative values, I get the following error:

ValueError                                Traceback (most recent call last)  /media/5804B87404B856AA/TFM_UC3M/test2_v.py in <module>() ----> 1        2        3        4        5   /usr/local/lib/python2.6/dist-packages/sklearn/base.pyc in fit_transform(self, X, y,     **fit_params)     427         else:     428             # fit method of arity 2 (supervised transformation)  --> 429             return self.fit(X, y, **fit_params).transform(X)     430      431   /usr/local/lib/python2.6/dist-packages/sklearn/feature_selection/univariate_selection.pyc in fit(self, X, y)     300         self._check_params(X, y)     301  --> 302         self.scores_, self.pvalues_ = self.score_func(X, y)     303         self.scores_ = np.asarray(self.scores_)     304         self.pvalues_ = np.asarray(self.pvalues_)  /usr/local/lib/python2.6/dist-  packages/sklearn/feature_selection/univariate_selection.pyc in chi2(X, y)     190     X = atleast2d_or_csr(X)     191     if np.any((X.data if issparse(X) else X) < 0): --> 192         raise ValueError("Input X must be non-negative.")     193      194     Y = LabelBinarizer().fit_transform(y)  ValueError: Input X must be non-negative.

Can someone tell me how can I transform my data ?

334

asked Sep 11 '14 15:09

sara

1 Answers

The error message Input X must be non-negative says it all: Pearson's chi square test (goodness of fit) does not apply to negative values. It's logical because the chi square test assumes frequencies distribution and a frequency can't be a negative number. Consequently, sklearn.feature_selection.chi2 asserts the input is non-negative.

You are saying that your features are "min, max, mean, median and FFT of accelerometer signal". In many cases, it may be quite safe to simply shift each feature to make it all positive, or even normalize to [0, 1] interval as suggested by EdChum.

If data transformation is for some reason not possible (e.g. a negative value is an important factor), you should pick another statistic to score your features:

sklearn.feature_selection.f_classif computes ANOVA f-value
sklearn.feature_selection.mutual_info_classif computes the mutual information

Since the whole point of this procedure is to prepare the features for another method, it's not a big deal to pick anyone, the end result usually the same or very close.

answered Sep 20 '22 06:09

Maxim

Related questions
                            
                                __init__() got an unexpected keyword argument 'user'
                            
                                Multilabel Text Classification using TensorFlow
                            
                                Pandas mask / where methods versus NumPy np.where
                            
                                Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output - while installing auto-py-to-exe through pip
                            
                                How to tell for which object attribute pickle fails?
                            
                                Is there a sessionInfo() equivalent in Python?
                            
                                Python 2.x vs 3.x Speed
                            
                                How to make virtual organisms learn using neural networks? [closed]
                            
                                What is the graft command in Python's MANIFEST.in file?
                            
                                How to solve import errors while trying to deploy Flask using WSGI on Apache2
                            
                                Check if one of variables is set to None
                            
                                Python submodule imports using __init__.py
                            
                                How do Monitored Training Sessions work?
                            
                                difference between cursor and connection objects
                            
                                Swap slices of Numpy arrays
                            
                                Efficient dot products of large memory-mapped arrays
                            
                                Numpy: What is special about division by 0.5?
                            
                                How to read/print the ( _io.TextIOWrapper) data?
                            
                                Optional chaining in Python
                            
                                How to set a python property in __init__

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Feature selection using scikit-learn

Tags:

python

machine-learning

scikit-learn

feature-selection

chi-squared

sara

People also ask

1 Answers

Maxim

Recent Activity

Donate For Us