I'm trying to graphically evaluate distributions (bimodal vs. unimodal) of datasets, in which the number of datapoints per dataset can vary widely. My problem is to indicate numbers of data points, using something like rug plots, but to avoid the problem of having a series with many data points overhwelm a series with only a few points.
Currently I'm working in ggplot2
, combining geom_density
and geom_rug
like so:
# Set up data: 1000 bimodal "b" points; 20 unimodal "a" points
set.seed(0); require(ggplot2)
x <- c(rnorm(500, mean=10, sd=1), rnorm(500, mean=5, sd=1), rnorm(20, mean=7, sd=1))
l <- c(rep("b", 1000), rep("a", 20))
d <- data.frame(x=x, l=l)
ggplot(d, aes(x=x, colour=l)) + geom_density() + geom_rug()
This almost does what I want - but the "a" points get overwhelmed by the "b" points.
I've hacked a solution using geom_point
instead of geom_rug
:
d$ypos <- NA
d$ypos[d$l=="b"] <- 0
d$ypos[d$l=="a"] <- 0.01
ggplot() +
geom_density(data=d, aes(x=x, colour=l)) +
geom_point(data=d, aes(x=x, y=ypos, colour=l), alpha=0.5)
However this is unsatisfying because the y positions must be adjusted manually. Is there a more automatic way to separate rug plots from different series, for instance using a position adjustment?
One way would be to use two geom_rug()
calls - one for b
, other for a
. Then for one geom_rug()
set sides="t"
to plot them on top.
ggplot(d, aes(x=x, colour=l)) + geom_density() +
geom_rug(data=subset(d,l=="b"),aes(x=x)) +
geom_rug(data=subset(d,l=="a"),aes(x=x),sides="t")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With