Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make geom_boxplot outliers "line up" with jittered geom_points?

How can I make geom_boxplot outliers overlay perfectly with jittered geom_points?

For example, I want the outliers from geom_boxplot to be displayed as "cross hairs" over their actual points from geom_point after jittering?

library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg)) + 
  geom_boxplot(outlier.shape=10, outlier.size=8)  +
  geom_point(aes(factor(cyl), mpg, color=mpg),  position="jitter", size=4)
p

plot

Any assistance would be greatly appreciated.

like image 724
MikeTP Avatar asked Mar 28 '13 15:03

MikeTP


2 Answers

I agree with Didzis that a solution that does exactly what you aim for is going to be fairly involved. To literally do what you suggest would require (I think) that you do both the jittering and the outlier calculation outside of ggplot. If you're flexible about how you highlight the outliers, this is a potentially shorter solution:

id_outliers <- function(x){
    q <- quantile(x,c(0.25,0.75))
    iqr <- abs(diff(q))
    ifelse((x < q[1] - 1.5*iqr) | (x > q[2] + 1.5*iqr),'Outlier','NotOutlier')
}

mtcars <- ddply(mtcars,
                .(cyl),
                transform,
                out = id_outliers(mpg))

p <- ggplot(mtcars, aes(factor(cyl), mpg)) + 
  geom_boxplot(outlier.colour = NA)  + 
  geom_point(aes(colour = mpg,shape = out),position = "jitter")
like image 128
joran Avatar answered Oct 16 '22 01:10

joran


This solution will be quite long. Problem is that with position="jitter" you can't get exact coordinates for points, so need to find workaround.

So take your original plot and save it with ggplot_build(). First element of data contains information about boxplots. We are interested in column group and outliers as it shows which values ggplot assumes as outliers. Save them as separate object.

p <- ggplot(mtcars, aes(factor(cyl), mpg)) + 
                geom_boxplot(outlier.shape=10, outlier.size=8)  +
                geom_point(aes(color=mpg),  position="jitter", size=4)
gg<-ggplot_build(p)
gg$data[[1]]
  ymin lower middle upper ymax         outliers notchupper notchlower x PANEL group weight ymin_final
1 21.4 22.80   26.0 30.40 33.9                    29.62055   22.37945 1     1     1      1       21.4
2 17.8 18.65   19.7 21.00 21.4                    21.10338   18.29662 2     1     2      1       17.8
3 13.3 14.40   15.2 16.25 18.7 10.4, 10.4, 19.2   15.98120   14.41880 3     1     3      1       10.4
  ymax_final  xmin  xmax
1       33.9 0.625 1.375
2       21.4 1.625 2.375
3       19.2 2.625 3.375

xx<-gg$data[[1]][c("group","outliers")]
xx
  group         outliers
1     1                 
2     2                 
3     3 10.4, 10.4, 19.2

Now change group values to 4,6 and 8 to be the same as cyl values.

xx$group<-c(4,6,8)

Now merge this new data frame with original mtcars and save as new data frame. Then apply function to check if particulars mpg value is listed in outliers for that cyl level. Those values (TRUE and FALSE) are saved in column out.

mtcars.new<-merge(mtcars,xx,by.x="cyl",by.y="group")
mtcars.new$out<-apply(mtcars.new,1,function(x) x$mpg %in% x$outliers)

Use new data frame to plot data. Remove outliers form geom_boxplot(). Use column out to determine shape and size of points. With scale_shape_manual() and scale_size_manual() adjust appearance.

ggplot(mtcars.new, aes(factor(cyl), mpg)) + 
          geom_boxplot(outlier.shape = NA)  +
          geom_point(aes(color=mpg,shape=out,size=out),  position="jitter")+
          scale_shape_manual(values=c(16,10),guide="none")+
          scale_size_manual(values=c(4,8),guide="none")

enter image description here

like image 21
Didzis Elferts Avatar answered Oct 16 '22 02:10

Didzis Elferts