Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2: add conditional density curves describing both dimensions of scatterplot

Tags:

plot

r

ggplot2

I have scatterplots of 2D data from two categories. I want to add density lines for each dimension -- not outside the plot (cf. Scatterplot with marginal histograms in ggplot2) but right on the plotting surface. I can get this for the x-axis dimension, like this:

set.seed(123)
dim1 <- c(rnorm(100, mean=1), rnorm(100, mean=4))
dim2 <- rnorm(200, mean=1)
cat <- factor(c(rep("a", 100), rep("b", 100)))
mydf <- data.frame(cbind(dim2, dim1, cat))
ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) + 
  geom_point() +
  stat_density(aes(x=dim1, y=(-2+(..scaled..))), 
  position="identity", geom="line")

It looks like this:

enter image description here

But I want an analogous pair of density curves running vertically, showing the distribution of points in the y-dimension. I tried

stat_density(aes(y=dim2, x=0+(..scaled..))), position="identity", geom="line)

but receive the error "stat_density requires the following missing aesthetics: x".

Any ideas? thanks

like image 499
D Swingley Avatar asked Jul 01 '15 18:07

D Swingley


3 Answers

You can get the densities of the dim2 variables. Then, flip the axes and store them in a new data.frame. After that it is simply plotting them on top of the other graph.

p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) + 
  geom_point() +
  stat_density(aes(x=dim1, y=(-2+(..scaled..))), 
               position="identity", geom="line")

stuff <- ggplot_build(p)
xrange <- stuff[[2]]$ranges[[1]]$x.range  # extract the x range, to make the new densities align with y-axis

## Get densities of dim2
ds <- do.call(rbind, lapply(unique(mydf$cat), function(lev) {
    dens <- with(mydf, density(dim2[cat==lev]))
    data.frame(x=dens$y+xrange[1], y=dens$x, cat=lev)
}))

p + geom_path(data=ds, aes(x=x, y=y, color=factor(cat)))

enter image description here

like image 72
Rorschach Avatar answered Oct 07 '22 16:10

Rorschach


So far I can produce:

distrib_horiz <- stat_density(aes(x=dim1, y=(-2+(..scaled..))), 
                              position="identity", geom="line")

ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) + 
  geom_point() + distrib_horiz

enter image description here

And:

distrib_vert <- stat_density(data=mydf, aes(x=dim2, y=(-2+(..scaled..))), 
                             position="identity", geom="line") 

ggplot(data=mydf, aes(x=dim2, y=dim1, colour=as.factor(cat))) + 
  geom_point() + distrib_vert + coord_flip()

enter image description here

But combining them is proving tricky.

like image 21
C8H10N4O2 Avatar answered Oct 07 '22 15:10

C8H10N4O2


So far I have only a partial solution since I didn't manage to obtain a vertical stat_density line for each individual category, only for the total set. Maybe this can nevertheless help as a starting point for finding a better solution. My suggestion is to try with the ggMarginal() function from the ggExtra package.

p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) + 
  geom_point() + stat_density(aes(x=dim1, y=(-2+(..scaled..))), 
           position="identity", geom="line")
library(ggExtra)
ggMarginal(p,type = "density", margins = "y", size = 4)

This is what I obtain: enter image description here

I know it's not perfect, but maybe it's a step in a helpful direction. At least I hope so. Looking forward to seeing other answers.

like image 36
RHertel Avatar answered Oct 07 '22 15:10

RHertel