Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many principal components to take?

I know that principal component analysis does a SVD on a matrix and then generates an eigen value matrix. To select the principal components we have to take only the first few eigen values. Now, how do we decide on the number of eigen values that we should take from the eigen value matrix?

like image 284
London guy Avatar asked Aug 22 '12 06:08

London guy


People also ask

How do you choose the number of principal components of PCA?

A widely applied approach is to decide on the number of principal components by examining a scree plot. By eyeballing the scree plot, and looking for a point at which the proportion of variance explained by each subsequent principal component drops off. This is often referred to as an elbow in the scree plot.

How many principal components can you have?

In a data set, the maximum number of principal component loadings is a minimum of (n-1, p).

How many principal components are required to explain 95% of the variance?

On the plotted chart, we see what number of principal components we need. In this case, to get 95% of variance explained I need 9 principal components.


1 Answers

To decide how many eigenvalues/eigenvectors to keep, you should consider your reason for doing PCA in the first place. Are you doing it for reducing storage requirements, to reduce dimensionality for a classification algorithm, or for some other reason? If you don't have any strict constraints, I recommend plotting the cumulative sum of eigenvalues (assuming they are in descending order). If you divide each value by the total sum of eigenvalues prior to plotting, then your plot will show the fraction of total variance retained vs. number of eigenvalues. The plot will then provide a good indication of when you hit the point of diminishing returns (i.e., little variance is gained by retaining additional eigenvalues).

like image 192
bogatron Avatar answered Sep 29 '22 13:09

bogatron