Is standardized scaling a pre-requisite for applying PCA using sklearn?

Tags:

I have a set of 70 input variables on which I need to perform PCA. As per my understanding centering data such that for each input variable mean is 0 and variance is 1, is necessary for applying PCA.

I am having a hard time figuring it out that do I need to perform standard scaling preprocessing.StandardScaler()before passing my data set to PCA or PCA function in sklearn does it on its own.

If latter is the case then irrespective of if I do, or do not apply preprocessing.StandardScaler() the explained_variance_ratio_ should be the same.

But the results are different, hence I believe preprocessing.StandardScaler() is necessary before applying PCA. Is it true?

378

asked Oct 19 '18 07:10

Furqan Hashim

1 Answers

Yes, it' true, scikit-learn's PCA does not apply standardization to the input dataset, it only centers it by subtracting the mean.

hellpanderr

Related questions
                            
                                Sklearn joblib load function IO error from AWS S3
                            
                                sklearn mask for onehotencoder does not work
                            
                                Python/Scikit-learn/regressions - from pandas Dataframes to Scikit prediction
                            
                                How to use sklearn Pipeline with custom Features?
                            
                                Difference between LinearRegression() and Ridge(alpha=0)
                            
                                Verbose log abbriviations meaning in SVC, scikit-learn
                            
                                sklearn random state not random
                            
                                What are noisy samples in Scikit's DBSCAN clustering algorithm?
                            
                                Pandas for Python: Exception: Data must be 1-dimensional
                            
                                ValueError: X.shape[1] = 15 should be equal to 700, the number of features at training time
                            
                                Scikit-learn does not work with string value on KNN
                            
                                How could I sort the coordinates according to the serpentine in the image?
                            
                                ValueError: multiclass-multioutput format is not supported using sklearn roc_auc_score function
                            
                                Calculate residual deviance from scikit-learn logistic regression model
                            
                                Selecting a Specific Number of Features via Sklearn's RFECV (Recursive Feature Elimination with Cross-validation)
                            
                                Bag of Words (BOW) vs N-gram (sklearn CountVectorizer) - text documents classification
                            
                                eli5: show_weights() with two labels
                            
                                Determine if text is in English?
                            
                                ImportError: cannot import name '_ccallback_c'
                            
                                Using hyphen/dash in python repository name and package name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is standardized scaling a pre-requisite for applying PCA using sklearn?

Tags:

centering

scikit-learn

pca

Furqan Hashim

People also ask

1 Answers

hellpanderr

Recent Activity

Donate For Us