Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Right order of doing feature selection, PCA and normalization?

I know that feature selection helps me remove features that may have low contribution. I know that PCA helps reduce possibly correlated features into one, reducing the dimensions. I know that normalization transforms features to the same scale.

But is there a recommended order to do these three steps? Logically I would think that I should weed out bad features by feature selection first, followed by normalizing them, and finally use PCA to reduce dimensions and make the features as independent from each other as possible.

Is this logic correct?

Bonus question - are there any more things to do (preprocess or transform) to the features before feeding them into the estimator?

like image 471
shikhanshu Avatar asked Sep 05 '17 20:09

shikhanshu


People also ask

Should I do normalization before PCA?

Yes, it is necessary to normalize data before performing PCA. The PCA calculates a new projection of your data set. And the new axis are based on the standard deviation of your variables.

Which comes first feature selection or feature engineering?

Similar to feature engineering, different feature selection algorithms are optimal for different types of data. And as always, the goals of the data scientist have to be accounted for as well when choosing the feature selection algorithm. But before all of this, feature engineering should always come first.

Do we need normalization after PCA?

2 Answers. Show activity on this post. Normalization is important in PCA since it is a variance maximizing exercise. It projects your original data onto directions which maximize the variance.

Should I scale data before feature selection?

Some feature selection methods will depend on the scale of the data, in which case it seems best to scale beforehand. Other methods won't depend on the scale, in which case it doesn't matter. All preprocessing should be done after the test split.


3 Answers

If I were doing a classifier of some sort I would personally use this order

  1. Normalization
  2. PCA
  3. Feature Selection

Normalization: You would do normalization first to get data into reasonable bounds. If you have data (x,y) and the range of x is from -1000 to +1000 and y is from -1 to +1 You can see any distance metric would automatically say a change in y is less significant than a change in X. we don't know that is the case yet. So we want to normalize our data.

PCA: Uses the eigenvalue decomposition of data to find an orthogonal basis set that describes the variance in data points. If you have 4 characteristics, PCA can show you that only 2 characteristics really differentiate data points which brings us to the last step

Feature Selection: once you have a coordinate space that better describes your data you can select which features are salient.Typically you'd use the largest eigenvalues(EVs) and their corresponding eigenvectors from PCA for your representation. Since larger EVs mean there is more variance in that data direction, you can get more granularity in isolating features. This is a good method to reduce number of dimensions of your problem.

of course this could change from problem to problem, but that is simply a generic guide.

like image 192
andrew Avatar answered Oct 19 '22 03:10

andrew


Generally speaking, Normalization is needed before PCA. The key to the problem is the order of feature selection, and it's depends on the method of feature selection.

A simple feature selection is to see whether the variance or standard deviation of the feature is small. If these values are relatively small, this feature may not help the classifier. But if you do normalization before you do this, the standard deviation and variance will become smaller (generally less than 1), which will result in very small differences in std or var between the different features.If you use zero-mean normalization, the mean of all the features will equal 0 and std equals 1.At this point, it might be bad to do normalization before feature selection

Feature selection is flexible, and there are many ways to select features. The order of feature selection should be chosen according to the actual situation

like image 6
AndyShan Avatar answered Oct 19 '22 02:10

AndyShan


Good answers here. One point needs to be highlighted. PCA is a form of dimensionality reduction. It will find a lower dimensional linear subspace that approximates the data well. When the axes of this subspace align with the features that one started with, it will lead to interpretable feature selection as well. Otherwise, feature selection after PCA, will lead to features that are linear combinations of the original set of features and they are difficult to interpret based on the original set of features.

like image 2
Hari Avatar answered Oct 19 '22 04:10

Hari