Show only high density areas with ggplot2's stat_density_2d

Question

I'd like to use stat_density2D function with categorical variables but restraining my plot to high density areas, in order to reduce overlapping and increase legibility.

Let's take an example with the following data:

plot_data <-
  data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
             Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
             Label = c(rep('A', 300), rep('B', 150)))

ggplot(plot_data, aes(X, Y, colour = Label)) + geom_point()

enter image description here

With a 2D-density plot we obtain overlapping densities

ggplot(plot_data, aes(X, Y)) + 
  stat_density_2d(geom = "polygon", aes(alpha = ..level.., fill = Label))

2D-density plot

Would it be possible to plot only high density areas (for instance level>0.03) ? The only solution I found is to "cheat" and manually modify the ..levels.. variable, either with step function or any power transformation, like in this simple example.

ggplot(plot_data, aes(X, Y)) + 
  stat_density_2d(geom = "polygon", aes(alpha = (..level..) ^ 2, fill = Label)) + 
  scale_alpha_continuous(range = c(0, 1))

2D-density plot with squared levels

Instead of modifying ..levels.. variable, is it possible to ask ggplot2/stat_density2D function to focus only on a certain range of density levels? I've tried to play with range or limits arguments of scale_alpha_continuous function without any relevant result...

Thanks!

mpalanco · Accepted Answer

Option 1
By adding to stat_density_2d the argument bins you definitely avoid overplotting, control and draw the attention to a number of density areas in a very economical fashion.

set.seed(123)
plot_data <-
  data.frame(
    X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
    Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
    Label = c(rep('A', 300), rep('B', 150))
  )
ggplot(plot_data, aes(X, Y, group = Label)) +
  stat_density_2d(geom = "polygon",
                  aes(alpha = ..level.., fill = Label),
                  bins = 4)

enter image description here

Option 2
Assigning manually the colours, NA for those levels we do not want to plot. Main disadvantage, we should know the number of levels and colours needed in advance (or compute them). In my example with set.seed(123)we need 7.

ggplot(plot_data, aes(X, Y, group = Label)) +
  stat_density_2d(geom = "polygon", aes(fill = as.factor(..level..))) +
  scale_fill_manual(values = c(NA, NA, NA,"#BDD7E7", "#6BAED6", "#3182BD", "#08519C"))

enter image description here

Marcelo · Answer

You have to generate the 2d kernel density manually and them plot the result. This way you can chose the values on each point as for example avoid overlap. Here is the code:

plot_data <-
  data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
             Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
             Label = c(rep('A', 300), rep('B', 150)))


library(ggplot2)
library(MASS)
library(tidyr)
#Calculate the range
xlim <- range(plot_data$X)
ylim <-range(plot_data$Y)


#Genrate the kernel density for each group
newplot_data <- plot_data %>% group_by(Label) %>% do(Dens=kde2d(.$X, .$Y, n=100, lims=c(xlim,ylim)))

#Transform the density in  data.frame
newplot_data  %<>%  do(Label=.$Label, V=expand.grid(.$Dens$x,.$Dens$y), Value=c(.$Dens$z)) %>% do(data.frame(Label=.$Label,x=.$V$Var1, y=.$V$Var2, Value=.$Value))

#Untidy data and chose the value for each point.
#In this case chose the value of the label with highest value  
   newplot_data  %<>%   spread( Label,value=Value) %>%
        mutate(Level = if_else(A>B, A, B), Label = if_else(A>B,"A", "B"))

Contour plot:

# Contour plot
ggplot(newplot_data, aes(x,y, z=Level, fill=Label, alpha=..level..))  + stat_contour(geom="polygon")

enter image description here

It seems the contour plot has some overlap due to round errors. We can try the raster plot:

#Raster plot
ggplot(newplot_data, aes(x,y, fill=Label, alpha=Level))  + geom_raster()

enter image description here

Show only high density areas with ggplot2's stat_density_2d

Tags:

r

ggplot2

Jonas

2 Answers

mpalanco

Marcelo

Recent Activity

Donate For Us

Show only high density areas with ggplot2's stat_density_2d

Tags:

r

ggplot2

Jonas

2 Answers

mpalanco

Marcelo

Related questions

Recent Activity

Donate For Us