Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting cumulative proportion in pca

Tags:

r

I want to retrive the cumulative proportion of explained variance after a pca in R. summary(pca) returns this result in its last row, but how can I extract this row?

summary(prcomp(USArrests, scale = TRUE))
Importance of components:
                          PC1    PC2     PC3     PC4
Standard deviation     1.5749 0.9949 0.59713 0.41645
Proportion of Variance 0.6201 0.2474 0.08914 0.04336
Cumulative Proportion  0.6201 0.8675 0.95664 1.00000

I tried s <- summary(prcomp(USArrests, scale = TRUE)) and s[3] etc, but it doesn't return the last row.

like image 715
bigTree Avatar asked May 26 '14 09:05

bigTree


People also ask

What is cumulative proportion in PCA?

Cumulative Proportion: This is simply the accumulated amount of explained variance, ie. if we used the first 10 components we would be able to account for >95% of total variance in the data.

How do you calculate the proportion of variance explained PCA?

Explained variance is calculated as ratio of eigenvalue of a articular principal component (eigenvector) with total eigenvalues. Explained variance can be calculated as the attribute explained_variance_ratio_ of PCA instance created using sklearn. decomposition PCA class.

What is cumulative explained variance in PCA?

The cumulative explained variance shows the accumulation of variance for each principal component number. The individual explained variance describes the variance of each principal component.

How many principal components are required to account for 95% of the variance in the biopsy data?

On the plotted chart, we see what number of principal components we need. In this case, to get 95% of variance explained I need 9 principal components.


1 Answers

Expanding on user20650's answer in the question's comments, as I believe it answers the question most directly (i.e. via the object itself, rather than recalculating). TL;DR: s$importance[3, ].

(s <- summary(prcomp(USArrests, scale = TRUE)))
# Importance of components:
#                           PC1    PC2     PC3     PC4
# Standard deviation     1.5749 0.9949 0.59713 0.41645
# Proportion of Variance 0.6201 0.2474 0.08914 0.04336
# Cumulative Proportion  0.6201 0.8675 0.95664 1.00000

str(s)
# List of 6
#  $ sdev      : num [1:4] 1.575 0.995 0.597 0.416
#  $ rotation  : num [1:4, 1:4] -0.536 -0.583 -0.278 -0.543 0.418 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
#   .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
#  $ center    : Named num [1:4] 7.79 170.76 65.54 21.23
#   ..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
#  $ scale     : Named num [1:4] 4.36 83.34 14.47 9.37
#   ..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
#  $ x         : num [1:50, 1:4] -0.976 -1.931 -1.745 0.14 -2.499 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
#   .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
#  $ importance: num [1:3, 1:4] 1.575 0.62 0.62 0.995 0.247 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:3] "Standard deviation" "Proportion of Variance" "Cumulative Proportion"
#   .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
#  - attr(*, "class")= chr "summary.prcomp"

# We see importance is the relevant feature
s$importance
#                             PC1       PC2       PC3       PC4
# Standard deviation     1.574878 0.9948694 0.5971291 0.4164494
# Proportion of Variance 0.620060 0.2474400 0.0891400 0.0433600
# Cumulative Proportion  0.620060 0.8675000 0.9566400 1.0000000

# Cool, same as displayed the table. Grab the third row and voila.
s$importance[3, ]  # Numeric vector
#     PC1     PC2     PC3     PC4 
# 0.62006 0.86750 0.95664 1.00000 
like image 130
Max Ghenis Avatar answered Sep 30 '22 23:09

Max Ghenis