Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cluster data in heat map in R ggplot

Tags:

r

ggplot2

heatmap

Please see my plot below: enter image description here

my code:

 > head(data)
              X0      X1      X2       X3       X4       X5       X6        X7        X8        X9
 NM_001001144 6.52334 9.75243 5.62914 6.833650 6.789850 7.421440 8.675330 12.117600 11.551500  7.676900
 NM_001001327 1.89826 3.74708 1.48213 0.590923 2.915120 4.052600 0.758997  3.653680  1.931400  2.487570
 NM_001002267 1.70346 2.72858 2.10879 1.898050 3.063480 4.435810 7.499640  5.038870 11.128700 22.016500
 NM_001003717 6.02279 7.46547 7.39593 7.344080 4.568470 3.347250 2.230450  3.598560  2.470390  4.184450
 NM_001003920 1.06842 1.11961 1.38981 1.054000 0.833823 0.866511 0.795384  0.980946  0.731532  0.949049
 NM_001003953 7.50832 7.13316 4.10741 5.327390 2.311230 1.023050 2.573220  1.883740  3.215150  2.483410

pd <- as.data.frame(scale(t(data)))
pd$Time <- sub("_.*", "", rownames(pd))
pd.m <- melt(pd)
pd.m$variable <- as.numeric(factor(pd.m$variable, levels =     rev(as.character(unique(pd.m$variable))), ordered=F))
p <- ggplot(pd.m, aes(Time, variable))
p  + geom_tile(aes(fill = value)) + scale_fill_gradient2(low=muted("blue"), high=muted("red")) +
  scale_x_discrete(labels=c("0h", "0.25h", "0.5h","1h","2h","3h","6h","12h","24h","48h")) + 
   theme_bw(base_size=20) + theme(axis.text.x=element_text(angle=0, vjust=0.5, hjust=0, size=12),
   axis.text.y=element_text(size=12), strip.text.y=element_text(angle=0, vjust=0.5, hjust=0.5, size=12),
   strip.text.x=element_text(size=12)) + labs(y="Genes", x="Time (h)", fill="")

Is there a way to cluster the plot so that the plot displays the dynamics in the time course. I would like to use the clustering that comes out of:

 hc.cols <- hclust(dist(t(data)))

enter image description here

like image 937
user3741035 Avatar asked Aug 27 '14 13:08

user3741035


People also ask

Can you make a heatmap in ggplot2?

To create a heatmap with the melted data so produced, we use geom_tile() function of the ggplot2 library. It is essentially used to create heatmaps.


2 Answers

You can achieve this by defining the order of Timepoints in a dendrogram after you have applied hclust to your data:

data <- scale(t(data))
ord <- hclust( dist(data, method = "euclidean"), method = "ward.D" )$order
ord
[1]  2  3  1  4  8  5  6 10  7  9

The only thing you have to do then is transforming your Time-column to a factor where the factor levels are ordered by ord:

pd <- as.data.frame( data )
pd$Time <- sub("_.*", "", rownames(pd))
pd.m <- melt( pd, id.vars = "Time", variable.name = "Gene" )

pd.m$Gene <- factor( pd.m$Gene, levels = colnames(data), labels = seq_along( colnames(data) ) )
pd.m$Time <- factor( pd.m$Time, levels = rownames(data)[ord],  labels = c("0h", "0.25h", "0.5h","1h","2h","3h","6h","12h","24h","48h") )

The rest is done by ggplot automatically:

ggplot( pd.m, aes(Time, Gene) ) +
  geom_tile(aes(fill = value)) +
  scale_fill_gradient2(low=muted("blue"), high=muted("red"))

enter image description here

like image 136
Beasterfield Avatar answered Sep 17 '22 21:09

Beasterfield


I don't think ggplot supports this out of the box, but you can use heatmap:

 heatmap(
   as.matrix(dat), Rowv=NA,
   Colv=as.dendrogram(hclust(dist(t(as.matrix(dat)))))
 )

enter image description here

Note this won't look like yours because I'm just using the head of your data, not the whole thing.

Here we specify the clustering manually with a dendogram derived from your hclust with the Colv argument. You can specify the clustering manually too through the Colv argument if the one used by default doesn't line up with what you want.

like image 25
BrodieG Avatar answered Sep 16 '22 21:09

BrodieG