Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Heatmap or plot for a correlation matrix [duplicate]

Tags:

r

lattice

I tried to make a plot out of the correlation matrix and having three colours to represent the correlation coefficients using the library lattice.

library(lattice)

levelplot(cor)

I obtain the following plot:

Plot of correlation matrix

The plot is only for a subset of the data I had. When I use the whole dataset( 400X400) then it becomes unclear and the colouring is not shown properly and is shown as dots. Is it possible to obtain the same in tile form for a large matrix?

I tried using the pheatmap function but I do not want my values to be clustered and just want a representaion of high and low values clearly in a tile form.

like image 576
user2258452 Avatar asked Apr 08 '13 19:04

user2258452


1 Answers

If you want to do a correlation plot, use the corrplot library as it has a lot of flexibility to create heatmap-like figures for correlations

library(corrplot)
#create data with some correlation structure
jnk=runif(1000)
jnk=(jnk*100)+c(1:500, 500:1)
jnk=matrix(jnk,nrow=100,ncol=10)
jnk=as.data.frame(jnk)
names(jnk)=c("var1", "var2","var3","var4","var5","var6","var7","var8","var9","var10")

#create correlation matrix
cor_jnk=cor(jnk, use="complete.obs")
#plot cor matrix
corrplot(cor_jnk, order="AOE", method="circle", tl.pos="lt", type="upper",        
tl.col="black", tl.cex=0.6, tl.srt=45, 
         addCoef.col="black", addCoefasPercent = TRUE,
         p.mat = 1-abs(cor_jnk), sig.level=0.50, insig = "blank")  

enter image description here The code above only adds color to the correlations that have > abs(0.5) correlation, but you can easily change that. Lastly, there are many ways that you can configure the look of the plot as well (change the color gradient, display of correlations, display of full vs only half matrix, etc.). The order argument is particularly useful as it allows you to order your variables in the correlation matrix based on PCA, so they are ordered based on similarities in correlation.

For squares for instance (similar to your original plot)- just change the method to squares: enter image description here

EDIT: @Carson. You can still use this method for reasonable large correlation matrices: for instance a 100 variable matrix below. Beyond that, I fail to see what is the use of making a graphical representation of a correlation matrix with so many variables without some subsetting, as that will be very hard to interpret. enter image description here

like image 86
Lucas Fortini Avatar answered Oct 09 '22 07:10

Lucas Fortini