Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to jitter two ggplot geoms in the same way?

Tags:

r

ggplot2

Using position_jitter creates random jitter to prevent overplotting of data points.

In the below I have used the example of baseball statistics to illustrate my problem. When I plot the same data with two layers, the same jitter call jitters the geoms a bit differently. This makes sense because it presumably generates the random jitter independently in the two calls, but yields the problem you can see in my graph below.

p=ggplot(baseball,aes(x=round(year,-1),y=sb,color=factor(lg))) 
p=p+stat_summary(fun.data="mean_cl_normal",position=position_jitter(width=3,height=0))+coord_cartesian(ylim=c(0,40))
p+stat_summary(fun.y=mean,geom="line",position=position_jitter(width=3,height=0))

Although the error bar points and the line refer to same data, they are disjointed—the lines and points do not connect.

Is there a work-around for this? I thought position dodge might be the answer but it doesn't seem to work with these kinds of plots. Alternatively, maybe there's some way to get the mean_cl_normal call to also add the lines? alt text

like image 787
Alex Holcombe Avatar asked Jul 02 '10 11:07

Alex Holcombe


2 Answers

I think so, by setting the seed to be the same in the two instances:

p=ggplot(baseball,aes(x=round(year,-1),y=sb,color=factor(lg)))
myseed = 2010
set.seed(myseed)
p=p+stat_summary(fun.data="mean_cl_normal",
  position=position_jitter(width=3,height=0))+coord_cartesian(ylim=c(0,40))
set.seed(myseed)
p+stat_summary(fun.y=mean,geom="line",
           position=position_jitter(width=3,height=0))

This ensures that the random number generator is sent back to the same starting position as was used in the initial call. However I don't know how you could extract the random increments added to the values.

like image 126
nullglob Avatar answered Sep 28 '22 16:09

nullglob


This is a weakness in the current ggplot2 syntax - there's no way to work around it except to add the jitter yourself.

Or you could do something like this:

ggplot(baseball, aes(round(year,-1) + as.numeric(factor(lg)), sb, color = factor(lg))) +
  stat_summary(fun.data="mean_cl_normal") +
  stat_summary(fun.y=mean,geom="line") +
  coord_cartesian(ylim=c(0,40))
like image 45
hadley Avatar answered Sep 28 '22 16:09

hadley