Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - how to make PCA biplot more readable

Tags:

plot

r

pca

I have a set of observations with 23 variables.

When I use prcomp and biplot to plot the results I run into several problems:

  1. the actual plot only occupies half of the frame (x < 0), but the plot is centered on 0, so half of space is wasted

  2. two variables clearily dominate the results, so all other arrows are clumped together and I can't read a thing

ad 1. I tried setting xlim and/or ylim, but I'm obviously doing something wrong since the plot is all messed up when I do

ad 2. Can I just somehow make the arrow labels placed more apart so that I can read them? Or maybe I could just plot the arrows without the two longest ones (kind of zoom-in)?

My PCA plot

Addendum: is it possible to have biplot draw the labels in a different color than the arrows?

Also: is it problematic if the x and y axes are not proportional (they graph shows intervals of different length on x and y). I think this would skew the angels between arrows, and that kind of resizing is not a similarity transformation. Is it possible to force biplot to keep a 1:1 aspect ratio, or to draw the plot as a rectangle and not a square?

like image 343
Jakub Bochenski Avatar asked Jun 11 '13 23:06

Jakub Bochenski


People also ask

What are the arrows in PCA?

Each variable that went into the PCA has an associated arrow. Arrows for each variable point in the direction of increasing values of that variable. If you look at the 'Rating' arrow, it points towards low values of PC1 - so we know the lower the value of PC1, the higher the Drinker Rating.

What is biplot analysis?

Biplot analysis is a multivariate analysis that trying to compress information and showing them in Cartesian coordinate using the Principal Component Analysis (PCA). To identify the variance of components, it's necessary to calculate the eigenvalue.


1 Answers

I think you can use xlim and ylim. Also, have a look at the expand argument for ?biplot. Unfortunately, you did not provide any data, so let's take some sample data:

a <- princomp(USArrests)

Below the result of just calling biplot:

biplot(a)

enter image description here

And now one can "zoom in" to have a closer look at "Murder" and "Rape" using xlim and ylim and also use the scaling argument expand from ?biplot:

biplot(a, expand=10, xlim=c(-0.30, 0.0), ylim=c(-0.1, 0.1))

enter image description here

Please note the different scaling on the top and right axis due to the expand factor.

Does this help to make your plot mare readable?

EDIT

You also asked whether it is possible to have different colors for labels and arrows. biplot does not support this, what you could do is to copy the code of stats:::biplot.default and then change it according to your needs (change col argument when plot, axis and text is used).

Alternatively, you could use ggplot for the biplot. In the post here, a simple biplot function is implemented. You could change the code as follows:

PCbiplot <- function(PC, x="PC1", y="PC2", colors=c('black', 'black', 'red', 'red')) {
    # PC being a prcomp object
    data <- data.frame(obsnames=row.names(PC$x), PC$x)
    plot <- ggplot(data, aes_string(x=x, y=y)) + geom_text(alpha=.4, size=3, aes(label=obsnames), color=colors[1])
    plot <- plot + geom_hline(aes(0), size=.2) + geom_vline(aes(0), size=.2, color=colors[2])
    datapc <- data.frame(varnames=rownames(PC$rotation), PC$rotation)
    mult <- min(
        (max(data[,y]) - min(data[,y])/(max(datapc[,y])-min(datapc[,y]))),
        (max(data[,x]) - min(data[,x])/(max(datapc[,x])-min(datapc[,x])))
        )
    datapc <- transform(datapc,
            v1 = .7 * mult * (get(x)),
            v2 = .7 * mult * (get(y))
            )
    plot <- plot + coord_equal() + geom_text(data=datapc, aes(x=v1, y=v2, label=varnames), size = 5, vjust=1, color=colors[3])
    plot <- plot + geom_segment(data=datapc, aes(x=0, y=0, xend=v1, yend=v2), arrow=arrow(length=unit(0.2,"cm")), alpha=0.75, color=colors[4])
    plot
}

Plot as follows:

fit <- prcomp(USArrests, scale=T)
PCbiplot(fit, colors=c("black", "black", "red", "yellow"))

enter image description here

If you play around a bit with this function, I am sure you can figure out how to set xlim and ylim values, etc.

like image 89
user1981275 Avatar answered Sep 24 '22 11:09

user1981275