Logo Questions Linux Laravel Mysql Ubuntu Git Menu

stat_sum and stat_identity give weird results




I have the following code, including randomly generated demo data:

n <- 10
group <- rep(1:4, n)
mass.means <- c(10, 20, 15, 30)
mass.sigma <- 4
score.means <- c(5, 5, 7, 4)
score.sigma <- 3
mass <- as.vector(model.matrix(~0+factor(group)) %*% mass.means) +
  rnorm(n*4, 0, mass.sigma)
score <- as.vector(model.matrix(~0+factor(group)) %*% score.means) +
  rnorm(n*4, 0, score.sigma)
data <- data.frame(id = 1:(n*4), group, mass, score)

Which gives:

  id group      mass    score
1  1     1 12.643603 5.015746
2  2     2 21.458750 5.590619
3  3     3 15.757938 8.777318
4  4     4 32.658551 6.365853
5  5     1  6.636169 5.885747
6  6     2 13.467437 6.390785

And then I want to plot the sum of "score", grouped by "group", in a bar chart:

plot <- ggplot(data = data, aes(x = group, y = score)) + 

This gives me: enter image description here

Weirdly, using stat_identity seems to give the result I am looking for:

plot <- ggplot(data = data, aes(x = group, y = score)) + 

enter image description here

Is this a bug? Using ggplot2 1.0.0 on R

platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
major          3                           
minor          1.2                         
year           2014                        
month          10                          
day            31                          
svn rev        66913                       
language       R                           
version.string R version 3.1.2 (2014-10-31)
nickname       Pumpkin Helmet    

Or what am I doing wrong?

like image 698
grssnbchr Avatar asked Jan 15 '15 14:01


1 Answers

plot <- ggplot(data = data, aes(x = group, y = score)) + 
  stat_summary(fun.y = "sum", geom = "bar", position = "identity")

resulting plot

aggregate(score ~ group, data=data, FUN=sum)
#  group    score
#1     1 51.71279
#2     2 58.94611
#3     3 67.52100
#4     4 39.24484


stat_sum does not work, because it doesn't just return the sum. It returns the "number of observations at position" and "percent of points in that panel at that position". It was designed for a different purpose. The docs say " Useful for overplotting on scatterplots."

stat_identity (kind of) works because geom_bar by default stacks the bars. You have many bars on top of each other in contrast to my solution that gives you just one bar per group. Look at this:

plot <- ggplot(data = data, aes(x = group, y = score)) + 
  geom_bar(stat="identity", color = "red") 

Also consider the warning:

Warning message:
Stacking not well defined when ymin != 0
like image 60
Roland Avatar answered Oct 25 '22 00:10
