Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to enforce ggplot's position_dodge on categories with no data?

Tags:

r

ggplot2

I'm trying to use position_dodge on ggplot to obtain boxplots of two different signals (ind) sharing the same categories (cat). When there is a category with data for one signal but not for the other one, the boxplot for the signal with data covers all the horizontal spacing, and does not respect the position_dodge instruction for that particular category. Is there a way to make ggplot to enforce the dodging rule? As you can see on the example below, the signal x has no data for category B, so it loses the space reserved by position_dodge. I would like to avoid that.

Thanks in advance.

data<-data.frame(cat=c('A','A','A','A','B','B','A','A','A','A','B','B'), 
                 values=c(3,2,1,4,NA,NA,4,5,6,7,8,9), 
                 ind=c('x','x','x','x','x','x','y','y','y','y','y','y'))

print(ggplot() +
        scale_colour_hue(guide='none') +
      geom_boxplot(
           aes(x=as.factor(cat), y=values, 
               fill=ind), 
           position=position_dodge(width=.60), 
           data=data,
           outlier.size = 1.2,
           na.rm=T))

graph with original problem

PROGRESS UPDATE

After some workarounds, I came up with the outcome I was looking for... (kind of)

data            <- data.frame(
cat=c('A','A','A','A','B','B','A','A','A','A','B','B','B'), 
values=c(3,2,1,4,NA,NA,4,5,6,7,8,9, 0), 
ind=c('x','x','x','x','x','x','y','y','y','y','y','y','x'))

p  <- ggplot() +
      scale_colour_hue(guide='none') +
      geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
      position=position_dodge(width=.60), 
      data=data,
      outlier.size = 1.2,
      na.rm=T) +
      geom_line(aes(x=x, y=y), 
                data=data.frame(x=c(0,3),y=rep(0,2)), 
                size = 1, 
                col='white')
print(p)

solution with workaround

Some people remcomended using faceting for the effect I wanted. Faceting doesn't give me the effect I'm looking for. The final graph I was looking for is shown below:

final graph

If you notice, the white major tick mark at y = 10 is thicker than the other tick marks. This thicker line is the geom_line with size=1 that hides unwanted boxplots.

I wish we could combine different geom objects more seamlessly. I reported this as a bug on Hadley's github, but Hadley said this is how position_dodge behaves by design. I guess I'm using ggplot2 in a non-standard way and workarounds are the way to go on these kind of issues. Anyways, I hope this helps some of the R folks to push ggplot great functionality a little further.

like image 660
JAponte Avatar asked Oct 09 '12 18:10

JAponte


2 Answers

After some workarounds, I came up with the outcome I was looking for... (kind of)

data            <- data.frame(
cat=c('A','A','A','A','B','B','A','A','A','A','B','B','B'), 
values=c(3,2,1,4,NA,NA,4,5,6,7,8,9, 0), 
ind=c('x','x','x','x','x','x','y','y','y','y','y','y','x'))

p  <- ggplot() +
      scale_colour_hue(guide='none') +
      geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
      position=position_dodge(width=.60), 
      data=data,
      outlier.size = 1.2,
      na.rm=T) +
      geom_line(aes(x=x, y=y), 
                data=data.frame(x=c(0,3),y=rep(0,2)), 
                size = 1, 
                col='white')
print(p)

solution with workaround

Some people recommended using faceting for the effect I wanted. Faceting doesn't give me the effect I'm looking for. The final graph I was looking for is shown below:

final graph

If you notice, the white major tick mark at y = 10 is thicker than the other tick marks. This thicker line is the geom_line with size=1 that hides unwanted boxplots.

I wish we could combine different geom objects more seamlessly. I reported this as a bug on Hadley's github, but Hadley said this is how position_dodge behaves by design. I guess I'm using ggplot2 in a non-standard way and workarounds are the way to go on these kind of issues. Anyways, I hope this helps some of the R folks to push ggplot great functionality a little further.

like image 71
JAponte Avatar answered Oct 26 '22 12:10

JAponte


I just got a clue to use faceting from one of the comments posted by Hadley at his git site, so credits goes to Hadley, the maintainer of ggplot2 package!

See if this is what you wanted. To learn more about options on setting the whiskers and others in this plot, check this help page in ggplot2 package:

?stat_boxplot

data<-data.frame(cat=c('A','A','A','A','B','B','A','A','A','A','B','B'), 
             values=c(3,2,1,4,NA,NA,4,5,6,7,8,9), 
             ind=c('x','x','x','x','x','x','y','y','y','y','y','y'))

p <- ggplot(data = data, aes(factor(cat), values))                     
p + stat_boxplot(geom="boxplot", position = "dodge", width = 0.60, na.rm = TRUE) +  facet_grid(.~ind)

enter image description here

To add colors to your plot, which in my opinion is a redundant one as you are already faceting the plot based on "ind" variable, try this:

p <- ggplot(data, aes(factor(cat), values, fill = ind))                     
p + stat_boxplot(geom="boxplot", position = "dodge", width = 0.60, na.rm = TRUE) + facet_grid(.~ind)

enter image description here

HTH!

like image 24
Sathish Avatar answered Oct 26 '22 11:10

Sathish