Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which feature scaling method to use before PCA?

I am working on a Kaggle dataset: https://www.kaggle.com/c/santander-customer-satisfaction. I understand some sort of feature scaling is needed before PCA. I read from this post and this post that normalization is best, however it was standardizing that gave me the highest performance (AUC-ROC).

I tried all the feature scaling methods from sklearn, including: RobustScaler(), Normalizer(), MinMaxScaler(), MaxAbsScaler() and StandardScaler(). Then using the scaled data, I did PCA. But it turns out that the optimal numbers of PCA's obtained vary greatly between these methods.

Here's the code I use:

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Find the optimal number of PCA 
pca = PCA(n_components=X_train_scaled.shape[1])
pca.fit(X_train_scaled)
ratios = pca.explained_variance_ratio_

# Plot the explained variance ratios
x = np.arange(X_train_scaled.shape[1])
plt.plot(x, np.cumsum(ratios), '-o')
plt.xlabel("Number of PCA's")
plt.ylabel("Cumulated Sum of Explained Variance")
plt.title("Variance Explained by PCA's")

# Find the optimal number of PCA's
for i in range(np.cumsum(ratios).shape[0]):
  if np.cumsum(ratios)[i] >= 0.99:
    num_pca = i + 1
    print "The optimal number of PCA's is: {}".format(num_pca)
    break
  else:
    continue

These are the different number of PCA's I got using different scalers.

  • RobustScaler: 9
  • Normalizer: 26
  • MinMaxScaler: 45
  • MaxAbsScaler: 45
  • StandardScaler: 142

So, my question is, which method is the right one for feature scaling in this situation? Thanks!

like image 494
George Liu Avatar asked May 14 '16 01:05

George Liu


People also ask

Should I do feature selection before PCA?

Anyways, the correct answer should be: it depends. Typically a Feature Selection step comes after the PCA (with a optimization parameter describing the number of features and Scaling comes before PCA. However, depending on the problem this my change. You might want to apply PCA only on a subset of features.

Is PCA used for feature scaling?

Principal Component Analysics (PCA) is also a good example of when feature scaling is important since we are interested in the components that maximize the variance and therefore we need to ensure that we are comparing apples to apples.

Which preprocessing steps is the most crucial before performing PCA?

Before applying PCA, the takeaway would always check the variance of each feature in the dataset, and if there is a large gap between the variances, scale the data with a proper scaler.

Should I scale features before PCA?

The rule of thumb is that if your data is already on a different scale (e.g. every feature is XX per 100 inhabitants), scaling it will remove the information contained in the fact that your features have unequal variances. If the data is on different scales, then you should normalize it before running PCA.


1 Answers

Data on which the PCA-transformation is calculated should be normalized, meaning in this case:

  • zero mean
  • unit variance

This basically is sklearns StandardScaler, which i would prefer of your candidates. The reasons are explained on Wiki and also here.

  • sklearns Normalizer is missing zero-mean
  • Both Min-Max scalers are missing unit-variance
  • Robust scaler could work on some data (outliers!), but i would prefer StandardScaler.
like image 183
sascha Avatar answered Sep 28 '22 08:09

sascha