Right order of doing feature selection, PCA and normalization?

Tags:

I know that feature selection helps me remove features that may have low contribution. I know that PCA helps reduce possibly correlated features into one, reducing the dimensions. I know that normalization transforms features to the same scale.

But is there a recommended order to do these three steps? Logically I would think that I should weed out bad features by feature selection first, followed by normalizing them, and finally use PCA to reduce dimensions and make the features as independent from each other as possible.

Is this logic correct?

Bonus question - are there any more things to do (preprocess or transform) to the features before feeding them into the estimator?

471

asked Sep 05 '17 20:09

shikhanshu

3 Answers

If I were doing a classifier of some sort I would personally use this order

Normalization
PCA
Feature Selection

Normalization: You would do normalization first to get data into reasonable bounds. If you have data (x,y) and the range of x is from -1000 to +1000 and y is from -1 to +1 You can see any distance metric would automatically say a change in y is less significant than a change in X. we don't know that is the case yet. So we want to normalize our data.

PCA: Uses the eigenvalue decomposition of data to find an orthogonal basis set that describes the variance in data points. If you have 4 characteristics, PCA can show you that only 2 characteristics really differentiate data points which brings us to the last step

Feature Selection: once you have a coordinate space that better describes your data you can select which features are salient.Typically you'd use the largest eigenvalues(EVs) and their corresponding eigenvectors from PCA for your representation. Since larger EVs mean there is more variance in that data direction, you can get more granularity in isolating features. This is a good method to reduce number of dimensions of your problem.

of course this could change from problem to problem, but that is simply a generic guide.

192

answered Oct 19 '22 03:10

andrew

Generally speaking, Normalization is needed before PCA. The key to the problem is the order of feature selection, and it's depends on the method of feature selection.

A simple feature selection is to see whether the variance or standard deviation of the feature is small. If these values are relatively small, this feature may not help the classifier. But if you do normalization before you do this, the standard deviation and variance will become smaller (generally less than 1), which will result in very small differences in std or var between the different features.If you use zero-mean normalization, the mean of all the features will equal 0 and std equals 1.At this point, it might be bad to do normalization before feature selection

Feature selection is flexible, and there are many ways to select features. The order of feature selection should be chosen according to the actual situation

answered Oct 19 '22 02:10

AndyShan

Good answers here. One point needs to be highlighted. PCA is a form of dimensionality reduction. It will find a lower dimensional linear subspace that approximates the data well. When the axes of this subspace align with the features that one started with, it will lead to interpretable feature selection as well. Otherwise, feature selection after PCA, will lead to features that are linear combinations of the original set of features and they are difficult to interpret based on the original set of features.

answered Oct 19 '22 04:10

Hari

Related questions
                            
                                How to write a custom evaluation metric in python for xgboost?
                            
                                How does tf.multinomial work?
                            
                                What is the difference between backpropagation and reverse-mode autodiff?
                            
                                Obtain importance of individual trees in a RandomForest
                            
                                How to acquire tf.data.dataset's shape?
                            
                                What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?
                            
                                scaling inputs data to neural network
                            
                                TimeDistributed vs. TimeDistributedDense Keras
                            
                                SGDClassifier vs LogisticRegression with sgd solver in scikit-learn library
                            
                                what is meaning of hook that used in tensorflow
                            
                                How do I load custom image based datasets into Pytorch for use with a CNN?
                            
                                get_config missing while loading previously saved model without custom layers
                            
                                How to avoid impression bias when calculate the ctr?
                            
                                How to get labels ids in Keras when training on multiple classes?
                            
                                UseMethod("predict") : no applicable method for 'predict' applied to an object of class "train"
                            
                                First Order Logic Engine
                            
                                Medical information extraction using Python
                            
                                Neural Network for File Decryption - Possible?
                            
                                General techniques to work with huge amounts of data on a non-super computer
                            
                                is it possible to use apache mahout without hadoop dependency?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Right order of doing feature selection, PCA and normalization?

Tags:

machine-learning

scikit-learn