PCA multiplot in R

Tags:

I have a dataset that looks like this:

India   China   Brasil  Russia  SAfrica Kenya   States  Indonesia   States  Argentina   Chile   Netherlands HongKong
0.0854026763    0.1389383234    0.1244184371    0.0525460881    0.2945586244    0.0404562539    0.0491597968    0   0   0.0618342901    0.0174891774    0.0634064181    0
0.0519483159    0.0573851759    0.0756806292    0.0207164181    0.0409872092    0.0706355932    0.0664503936    0.0775285039    0.008545575 0.0365674701    0.026595575 0.064280902 0.0338135148
0   0   0   0   0   0   0   0   0   0   0   0   0
0.0943708876    0   0   0.0967733329    0   0.0745076688    0   0   0   0.0427047276    0   0.0583873189    0
0.0149521013    0.0067569437    0.0108914448    0.0229991162    0.0151678343    0.0413174214    0   0.0240999375    0   0.0608951432    0.0076549109    0   0.0291972756
0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0.0096710124    0.0095669967    0   0.0678582869    0   0   0.0170707337    0.0096565543    0.0116698364    0.0122773071
0.1002690681    0.0934563916    0.0821680095    0.1349534369    0.1017157777    0.1113249348    0.1713480649    0.0538715423    0.4731833978    0.1956743964    0.6865919069    0.2869189344    0.5364034876
1.5458338337    0.2675380321    0.6229046372    0.5059107039    0.934209603 0.4933799388    0.4259769181    0.3534169521    14.4134845836   4.8817632117    13.4034293299   3.7849346739    12.138551171
0.4625375671    0.320258205 0.4216459567    0.4992764309    0.4115887595    0.4783677078    0.4982410179    0.2790259278    0.3804405781    0.2594924212    0.4542162376    0.3012339384    0.3450847892
0.357614592 0.3932670219    0.3803417257    0.4615355254    0.3807061655    0.4122433346    0.4422282977    0.3053712842    0.297943232 0.2658160167    0.3244018409    0.2523836582    0.3106600754
0.359953567 0.3958391813    0.3828293473    0.4631507073    0.3831961707    0.4138590365    0.4451206879    0.3073685624    0.2046559772    0.2403036541    0.2326305393    0.2269373716    0.2342962436
0.7887404662    0.6545878236    0.7443676393    0.7681244767    0.5938002158    0.5052305973    0.4354571648    0.40511005  0.8372481106    0.5971130339    0.8025313223    0.5708610817    0.8556609579
0.5574207497    1.2175251783    0.8797484259    0.952685465 0.4476585005    1.1919229479    1.03612509  0.5490564488    0.2407034171    0.5675492645    0.4994121344    0.5460544861    0.3779468604
0.5632651223    1.0181714714    1.1253803155    1.228293512 0.6949993291    1.0346288085    0.5955221073    0.5212567091    1.1674901423    1.2442735568    1.207624867 1.3854352274    0.7557131826
0.6914760031    0.7831502333    1.0282730148    0.750270567 0.7072739935    0.8041764647    0.8918512571    0.6998554585    2.3448306081    1.2905783367    2.4295927684    1.3029766224    1.9310763864
0.3459898177    0.7474525109    0.7253451876    0.7182493014    0.3081791886    0.7462088907    0.5950509439    0.4443221541    3.6106852374    2.7647504885    3.3698608994    2.6523062395    1.8016571476
0.4629523517    0.6549211677    0.6158018856    0.7637088814    0.4951554309    0.6277236471    0.6227669055    0.383909839 2.9502307101    1.803480973 2.3083113522    1.668759497 1.7130459012
0.301548861 0.5961888126    0.4027007075    0.5540290853    0.4078662541    0.5108773106    0.4610682726    0.3712800134    0.3813402422    0.7391417247    1.0935364978    0.691857974 0.4416304953
2.5038287529    3.2005148394    2.9181517373    3.557918333 1.8868234768    2.9369926312    0.4117894127    0.3074815035    3.9187777037    7.3161555954    6.9586996112    5.7096144353    2.7007439732
2.5079707359    3.2058093222    2.9229791182    3.563804054 1.8899447728    2.9418511798    0.4124706194    0.269491388 3.9252603798    7.3282584169    6.9702111077    5.7190596205    2.7052117051
2.6643724791    1.2405320493    2.0584120188    2.2354369334    1.7199730388    2.039829709 1.7428132997    0.9977029725    8.9650886611    4.6035139163    8.1430131464    5.2450639988    6.963309864
0.5270581435    0.8222128903    0.7713479951    0.8785815313    0.624993821 0.7410405193    0.5350834321    0.4797121891    1.3753525725    1.2219267886    1.397221881 1.2433155977    0.8647136903
0.2536079475    0.5195514789    0.0492623195    0.416102668 0.2572670724    0.4805482899    0.4866090738    0.4905212099    0.2002506403    0.5508609827    0.3808572148    0.6276294938    0.3191452919
0.3499009885    0.5837491529    0.4914807442    0.5851537888    0.3638549977    0.537655052 0.5757185943    0.4730102035    0.9098072064    0.6197285737    0.7781825654    0.6424684366    0.6424429128
0.6093076876    0.9456457011    0.8518013605    1.1360347777    0.511960743 0.9038104168    0.5048413575    0.2777622235    0.2915840525    0.6628516415    0.4600364351    0.7996524113    0.3765721177
0.9119207879    1.2363073271    1.3285269752    1.4027039939    0.9250782309    2.1599381031    1.312307839 0   0   0.8253250513    0   0   0.8903632354

It is stored in a data.txt file.

I want to have a PCA multiplot that looks like this: THIS

What I am doing:

d <- read.table("data.txt", header=TRUE, as.is=TRUE)
model <- prcomp(d, scale=TRUE)

After this I am lost.

How can I cluster the dataset according to the PCA projections and obtain the pictures similar to those above?

935

asked Jun 18 '14 09:06

Angelo

1 Answers

You are actually asking two different questions:

How to cluster the data after PCA projections.
How to obtain the above plots.

However before getting to those I would like to add that if your samples are in columns, then you are not doing PCA correctly. You should do it on transposed dataset instead like so:

model <- prcomp(t(d), scale=TRUE)

But for that to work you would have to remove all the constant rows in your data.

Now I assume that you did your PCA step how you wanted.

prcomp returns the rotated matrix when you specify retX=TRUE (it's true by default). So you will want to use model$x.

Your next step is clustering the data based on principal components. This can be done in various ways. One is hierarchical clustering. If you want 5 groups in the end here is one way:

fit <- hclust(dist(model$x[,1:3]), method="complete") # 1:3 -> based on 3 components
groups <- cutree(fit, k=5)                            # k=5 -> 5 groups

This step will get you groups that will be later used for coloring.

The final step is plotting. Here I wrote a simple function to do all in one shot:

library(rgl)
plotPCA <- function(x, nGroup) {
    n <- ncol(x) 
    if(!(n %in% c(2,3))) { # check if 2d or 3d
        stop("x must have either 2 or 3 columns")
    }

    fit <- hclust(dist(x), method="complete") # cluster
    groups <- cutree(fit, k=nGroup)

    if(n == 3) { # 3d plot
        plot3d(x, col=groups, type="s", size=1, axes=F)
        axes3d(edges=c("x--", "y--", "z"), lwd=3, axes.len=2, labels=FALSE)
        grid3d("x")
        grid3d("y")
        grid3d("z")
    } else { # 2d plot
        maxes <- apply(abs(x), 2, max)
        rangeX <- c(-maxes[1], maxes[1])
        rangeY <- c(-maxes[2], maxes[2])
        plot(x, col=groups, pch=19, xlab=colnames(x)[1], ylab=colnames(x)[2], xlim=rangeX, ylim=rangeY)
        lines(c(0,0), rangeX*2)
        lines(rangeY*2, c(0,0))
    }
}

This function is simple: it takes two arguments: 1) a matrix of scores, with principal components in columns and your samples in rows. You can basically use model$x[,c(1,2,4)] if you want (for example) 1st, 2nd and 4th components. 2) number of groups for clustering.

Then it cluster the data based on passed principal components and plots (either 2D or 3D depending on the number of columns passed)

Here are few examples:

plotPCA(model$x[,1:2], 5)

PCA2d

And 3D example (based on 3 first principal components):

plotPCA(model$x[,1:3], 5)

PCA3D

This last plot will be interactive so you can rotate it to or zoom in/out.

Hope this helps.

193

answered Oct 03 '22 12:10

Karolis Koncevičius

Related questions
                            
                                Highlighting particular regions of a scatterplot in a ggplot
                            
                                R verify source code
                            
                                How does one turn contour lines into filled contours?
                            
                                change code block color in knitr/markdown
                            
                                Feeding newdata to R predict function
                            
                                Disregarding simple warnings/errors in tryCatch()
                            
                                using scientific notation in R
                            
                                R, filter matrix based on variance cut-offs
                            
                                sprintf format strings: reference by name?
                            
                                R finding rows of a data frame where certain columns match those of another [duplicate]
                            
                                Argument of set.seed in R
                            
                                Numerical derivatives of an arbitrarily defined function
                            
                                "NAs introduced by coercion" during Cluster Analysis in R
                            
                                ggplot2 version 0.9.3.1 won't load into R 3.0.2
                            
                                How to use parameters with RPostgreSQL (to insert data)
                            
                                R barplot: wrapping long text labels?
                            
                                Speeding up the processing of large data frames in R
                            
                                How to add a non-overlapping legend to associate colors with categories in pairs()?
                            
                                How to get multiple ggplot2 scale_fill_gradientn with same scale?
                            
                                how to interpret cca vegan output

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PCA multiplot in R

Tags:

plot

r

cluster-analysis

3d

pca

Angelo

People also ask

1 Answers

Karolis Koncevičius

Recent Activity

Donate For Us