Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drawing a contour line around connected cells in a heatmap in R

I have data with two time axes and measurements for each cell. From this I create a heatmap. I also know for each cell whether the measurement is significant.

My problem is to draw a contour line around all cells that are significant. If cells form clusters with the same significance value, I need to draw the contour around the cluster and not around each individual cell.

The data are in this format:

   x_time y_time    metric signif
1       1      1 0.3422285  FALSE
2       2      1 0.6114085  FALSE
3       3      1 0.5381621  FALSE
4       4      1 0.5175120  FALSE
5       1      2 0.6997991  FALSE
6       2      2 0.3054885  FALSE
7       3      2 0.8353888   TRUE
8       4      2 0.3991566   TRUE
9       1      3 0.7522728   TRUE
10      2      3 0.5311418   TRUE
11      3      3 0.4972816   TRUE
12      4      3 0.4330033   TRUE
13      1      4 0.5157972   TRUE
14      2      4 0.6324151   TRUE
15      3      4 0.4734126   TRUE
16      4      4 0.4315119   TRUE

The code below generates this data, where the measurements are random (dt$metrics) and the significance is logical (dt$signif).

# data example
dt <- data.frame(x_time=rep(seq(1, 4), 4), 
                 y_time=rep(seq(1, 4), each=4),
                 metric=(rnorm(16, 0.5, 0.2)),
                 signif=c(rep(FALSE, 6), rep(TRUE, 10)))

The heatmap alone can be generated using ggplot2's geom_tile

# Generate heatmap using ggplot2's geom_tile
library(ggplot2)
p <- ggplot(data = dt, aes(x = x_time, y = y_time))
p <- p + geom_tile(aes(fill = metric))

Based on this question, I managed to draw contours with different colors around each cell according to the significance value.

# Heatmap with lines around each significant cell
p <- ggplot(data = dt, aes(x = x_time, y = y_time))
p <- p + geom_tile(aes(fill = metric, color = signif), size = 2)
p <- p + scale_color_manual(values = c("black", "white"))

This Figure displays the result of this approach.

However, this approach does not group adjacent significant cells together by drawing a contour around the entire group (as is also discussed in the question I linked to).

As this question shows, it is possible to draw boxes around specified areas, but I do not think this can be extended to all possible clusters of cells.

like image 858
Christoph Aurnhammer Avatar asked Mar 20 '19 10:03

Christoph Aurnhammer


2 Answers

This answer is based on How to get contour lines around the grids in R-raster?.

library(data.table)
library(raster)

Also note that clump requires that igraph package is installed, and dissolve = TRUE in rasterToPolygons requires rgeos.

# convert data.frame to data.table
# not strictly necessary, but enables use of convenient functions: dcast and rbindlist.
setDT(d)

# reshape to wide 
d2 <- dcast(d, y ~ x, value.var = "sig")

# reverse order of rows to match raster order
# remove first column
# convert to matrix and then to raster
r <- raster(as.matrix(d2[ , .SD[.N:1, -1]]),
            xmn = 0, xmx = ncol(d2) - 1, ymn = 0, ymx = ncol(d2) - 1)

# detect clumps of connected cells of the value TRUE
# convert raster to polygons
# dissolve polygons into multi-polygons
polys <- rasterToPolygons(clump(r), dissolve = TRUE)

# grab coordinates of individual polygons and convert to a data.table
# use idcol = TRUE to enable grouping of paths when plotting
d_poly <- rbindlist(lapply(polys@polygons,
                           function(x) as.data.table(x@Polygons[[1]]@coords)),
                    idcol = TRUE)

# plot an outline around each 'patch of significant values' using geom_path 
ggplot(d, aes(x = x, y = y)) +
  geom_tile(aes(fill = z)) +
  geom_path(data = d_poly, aes(x = x + 0.5, y = y + 0.5, group = .id),
            size = 2, color = "red")

enter image description here


Data:

d <- structure(list(x = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
                          3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L),
                    y = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
                          1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
                    sig = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
                            TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
                    z = c(0.96, 0.76, 0.14, 0.93, 0.39, 0.06, 0.99, 0.77,
                          0.7, 0.72, 0.08, 0.94, 0.98,  0.83, 0.12, 0.42)),
               row.names = c(NA, -16L), class = "data.frame")
like image 123
Henrik Avatar answered Nov 18 '22 17:11

Henrik


Surely this would be a bit tedious if you were to create lots of heatmaps (even though it's probably possible to create data frames with the necessary values from your data), but otherwise you can play with geom_segments:

p + geom_segment(aes(x = .5, xend = 4.5, y = 4.5, yend = 4.5), colour = "white", size = 2) +
  geom_segment(aes(x = .5, xend = 2.5, y = 2.5, yend = 2.5), colour = "white", size = 2) +
  geom_segment(aes(x = 2.5, xend = 4.5, y = 1.5, yend = 1.5), colour = "white", size = 2) +
  geom_segment(aes(x = .5, xend = .5, y = 2.5, yend = 4.5), colour = "white", size = 2) +
  geom_segment(aes(x = 2.5, xend = 2.5, y = 1.5, yend = 2.5), colour = "white", size = 2) +
  geom_segment(aes(x = 4.5, xend = 4.5, y = 1.5, yend = 4.5), colour = "white", size = 2)

enter image description here

like image 38
erc Avatar answered Nov 18 '22 17:11

erc