How many principal components to take?

Tags:

I know that principal component analysis does a SVD on a matrix and then generates an eigen value matrix. To select the principal components we have to take only the first few eigen values. Now, how do we decide on the number of eigen values that we should take from the eigen value matrix?

284

asked Aug 22 '12 06:08

London guy

1 Answers

To decide how many eigenvalues/eigenvectors to keep, you should consider your reason for doing PCA in the first place. Are you doing it for reducing storage requirements, to reduce dimensionality for a classification algorithm, or for some other reason? If you don't have any strict constraints, I recommend plotting the cumulative sum of eigenvalues (assuming they are in descending order). If you divide each value by the total sum of eigenvalues prior to plotting, then your plot will show the fraction of total variance retained vs. number of eigenvalues. The plot will then provide a good indication of when you hit the point of diminishing returns (i.e., little variance is gained by retaining additional eigenvalues).

192

answered Sep 29 '22 13:09

bogatron

Related questions
                            
                                Why feature scaling in SVM?
                            
                                How to calculate prediction uncertainty using Keras?
                            
                                How to predict time series in scikit-learn?
                            
                                Options for deploying R models in production
                            
                                scikit-learn: how to scale back the 'y' predicted result
                            
                                How can I classify data with the nearest-neighbor algorithm using Python?
                            
                                Evaluate multiple scores on sklearn cross_val_score
                            
                                How to tell which Keras model is better?
                            
                                What is the use of train_on_batch() in keras?
                            
                                What is the correct way to change image channel ordering between channels first and channels last?
                            
                                PCA For categorical features?
                            
                                Machine Learning and Natural Language Processing [closed]
                            
                                What is the difference between Keras model.evaluate() and model.predict()?
                            
                                Different decision tree algorithms with comparison of complexity or performance
                            
                                Received a label value of 1 which is outside the valid range of [0, 1) - Python, Keras
                            
                                How to calculate the number of parameters of convolutional neural networks?
                            
                                Can I use CountVectorizer in scikit-learn to count frequency of documents that were not used to extract the tokens?
                            
                                Labels for clustermap in seaborn?
                            
                                How to calculate the regularization parameter in linear regression
                            
                                Make a custom loss function in keras

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How many principal components to take?

Tags:

machine-learning

data-mining

svd

London guy

People also ask

1 Answers

bogatron

Recent Activity

Donate For Us