I'd like to use stat_density2D function with categorical variables but restraining my plot to high density areas, in order to reduce overlapping and increase legibility.
Let's take an example with the following data:
plot_data <-
data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
Label = c(rep('A', 300), rep('B', 150)))
ggplot(plot_data, aes(X, Y, colour = Label)) + geom_point()
With a 2D-density plot we obtain overlapping densities
ggplot(plot_data, aes(X, Y)) +
stat_density_2d(geom = "polygon", aes(alpha = ..level.., fill = Label))
Would it be possible to plot only high density areas (for instance level>0.03
) ? The only solution I found is to "cheat" and manually modify the ..levels..
variable, either with step function or any power transformation, like in this simple example.
ggplot(plot_data, aes(X, Y)) +
stat_density_2d(geom = "polygon", aes(alpha = (..level..) ^ 2, fill = Label)) +
scale_alpha_continuous(range = c(0, 1))
Instead of modifying ..levels..
variable, is it possible to ask ggplot2/stat_density2D function to focus only on a certain range of density levels? I've tried to play with range
or limits
arguments of scale_alpha_continuous
function without any relevant result...
Thanks!
Option 1
By adding to stat_density_2d
the argument bins
you definitely avoid overplotting, control and draw the attention to a number of density areas in a very economical fashion.
set.seed(123)
plot_data <-
data.frame(
X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
Label = c(rep('A', 300), rep('B', 150))
)
ggplot(plot_data, aes(X, Y, group = Label)) +
stat_density_2d(geom = "polygon",
aes(alpha = ..level.., fill = Label),
bins = 4)
Option 2
Assigning manually the colours, NA for those levels we do not want to plot. Main disadvantage, we should know the number of levels and colours needed in advance (or compute them). In my example with set.seed(123)
we need 7.
ggplot(plot_data, aes(X, Y, group = Label)) +
stat_density_2d(geom = "polygon", aes(fill = as.factor(..level..))) +
scale_fill_manual(values = c(NA, NA, NA,"#BDD7E7", "#6BAED6", "#3182BD", "#08519C"))
You have to generate the 2d kernel density manually and them plot the result. This way you can chose the values on each point as for example avoid overlap. Here is the code:
plot_data <-
data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
Label = c(rep('A', 300), rep('B', 150)))
library(ggplot2)
library(MASS)
library(tidyr)
#Calculate the range
xlim <- range(plot_data$X)
ylim <-range(plot_data$Y)
#Genrate the kernel density for each group
newplot_data <- plot_data %>% group_by(Label) %>% do(Dens=kde2d(.$X, .$Y, n=100, lims=c(xlim,ylim)))
#Transform the density in data.frame
newplot_data %<>% do(Label=.$Label, V=expand.grid(.$Dens$x,.$Dens$y), Value=c(.$Dens$z)) %>% do(data.frame(Label=.$Label,x=.$V$Var1, y=.$V$Var2, Value=.$Value))
#Untidy data and chose the value for each point.
#In this case chose the value of the label with highest value
newplot_data %<>% spread( Label,value=Value) %>%
mutate(Level = if_else(A>B, A, B), Label = if_else(A>B,"A", "B"))
Contour plot:
# Contour plot
ggplot(newplot_data, aes(x,y, z=Level, fill=Label, alpha=..level..)) + stat_contour(geom="polygon")
It seems the contour plot has some overlap due to round errors. We can try the raster plot:
#Raster plot
ggplot(newplot_data, aes(x,y, fill=Label, alpha=Level)) + geom_raster()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With