Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pseudocolour in ggplot2 scatter plot to indicate density

Does someone know how to create a graph like the one in the screenshot? I've tried to get a similar effect adjusting alpha, but this renders outliers to be almost invisible. I know this type of graph only from a software called FlowJo, here they refer to it as "pseudocolored dot plot". Not sure if this an official term.

Screenshot from Corces et al., Nature Genetics 2016

I'd like to do it specifically in ggplot2, as I need the faceting option. I attached another screenshot of one of my graphs. The vertical lines depict clusters of mutations at certain genomic regions. Some of these clusters are much denser than others. I'd like to illustrate this using the density colors.

Rainfall Plot

The data is quite big and hard to simulate, but here's a try. I doesn't look like the actual data, but the data format is the same.

chr <- c(rep(1:10,1000))
position <- runif(10000, min=0, max=5e8)
distance <- runif(10000, min=1, max=1e5)
log10dist <- log10(distance)

df1 <- data.frame(chr, position, distance, log10dist)

ggplot(df1, aes(position, log10dist)) + 
  geom_point(shape=16, size=0.25, alpha=0.5, show.legend = FALSE) +
  facet_wrap(~chr, ncol = 5, nrow = 2, scales = "free_x")

Any help is highly appreciated.

like image 207
Peer Wünsche Avatar asked Aug 19 '16 12:08

Peer Wünsche


People also ask

How do you color code a scatter plot in R?

The different color systems available in R have been described in detail here. To change scatter plot color according to the group, you have to specify the name of the data column containing the groups using the argument groupName . Use the argument groupColors , to specify colors by hexadecimal code or by name .

What is density in scatter plot?

The density scatterplot is a type of two-dimensional histogram showing the count of points in each region of the plot. In this this case the plotting region—the grey square—is divided into 40,000 cells (200*200) of equal size.

How do I use ggMarginal?

ggExtra comes with an addin for ggMarginal() , which lets you interactively add marginal plots to a scatter plot. To use it, simply highlight the code for a ggplot2 plot in your script, and select ggplot2 Marginal Plots from the RStudio Addins menu.


1 Answers

library(ggplot2)
library(ggalt)
library(viridis)

chr <- c(rep(1:10,1000))
position <- runif(10000, min=0, max=5e8)
distance <- runif(10000, min=1, max=1e5)
log10dist <- log10(distance)

df1 <- data.frame(chr, position, distance, log10dist)

ggplot(df1, aes(position, log10dist)) + 
  geom_point(shape=16, size=0.25, show.legend = FALSE) +
  stat_bkde2d(aes(fill=..level..), geom="polygon") +
  scale_fill_viridis() +
  facet_wrap(~chr, ncol = 5, nrow = 2, scales = "free_x")

enter image description here

In practice, I'd take the initial bandwidth guess and then figure out an optimal bandwidth. Apart from taking the lazy approach and just plotting the points w/o filtering (smoothScatter() filters everything but the outliers based on npoints) this is generating the "smoothed scatterplot" like the example you posted.

smoothScatter() uses different defaults, so it comes out a bit differently:

par(mfrow=c(nr=2, nc=5))
for (chr in unique(df1$chr)) {
  plt_df <- dplyr::filter(df1, chr==chr)
  smoothScatter(df1$position, df1$log10dist, colramp=viridis)
}

enter image description here

geom_hex() is going to show the outliers, but not as distinct points:

ggplot(df1, aes(position, log10dist)) + 
  geom_point(shape=16, size=0.25, show.legend = FALSE, color="red") +
  scale_fill_viridis() +
  facet_wrap(~chr, ncol = 5, nrow = 2, scales = "free_x")

enter image description here

This:

ggplot(df1, aes(position, log10dist)) + 
  geom_point(shape=16, size=0.25) +
  stat_bkde2d(bandwidth=c(18036446, 0.05014539), 
              grid_size=c(128, 128), geom="polygon", aes(fill=..level..)) +
  scale_y_continuous(limits=c(3.5, 5.1)) +
  scale_fill_viridis() +
  facet_wrap(~chr, ncol = 5, nrow = 2, scales = "free_x") +
  theme_bw() +
  theme(panel.grid=element_blank())

enter image description here

gets you very close to the defaults smoothScatter() uses, but hackishly accomplishes most of what the nrpoints filtering code does in the smoothScatter() function solely by restricting the y axis limits.

like image 140
hrbrmstr Avatar answered Oct 28 '22 19:10

hrbrmstr