Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot geom_bar with stat = "sum"

Tags:

r

ggplot2

I want to plot a bar chart summing a variable along two dimensions, one will be spread along x, and the other will be spread vertically (stacked).

I would expect the two following instructions to do the same, but they don't and only the 2nd one gives the desired output (where I aggregate the data myself).

I'd like to understand what's going on in the first case, and if there's a way to use ggplot2 's built-in aggregation features to get the right output.

library(ggplot2)
library(dplyr)
p1 <- ggplot(diamonds,aes(cut,price,fill=color)) + 
  geom_bar(stat="sum",na.rm=TRUE)

yielding this plot:

enter image description here

p2 <- ggplot(diamonds %>%
                group_by(cut,color) %>%
                summarize_at("price",sum,na.rm=T),
              aes(cut,price,fill=color)) +
  geom_bar(stat="identity",na.rm=TRUE)

yielding this picture:

enter image description here

Here's where the top of our bars should be, p1 doesn't give these values:

diamonds %>% group_by(cut) %>% summarize_at("price",sum,na.rm=TRUE)
# # A tibble: 5 x 2
# cut    price
# <ord>    <int>
# 1      Fair  7017600
# 2      Good 19275009
# 3 Very Good 48107623
# 4   Premium 63221498
# 5     Ideal 74513487
like image 243
Moody_Mudskipper Avatar asked Dec 14 '17 16:12

Moody_Mudskipper


People also ask

What is stat in Geom_bar?

By default, geom_bar uses stat="count" which makes the height of the bar proportion to the number of cases in each group (or if the weight aethetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use stat="identity" and map a variable to the y aesthetic.

What is the difference between using Geom_bar () and Geom_bar Stat identity )?

The key difference is how they aggregate the data by default. For geom_bar() , the default behavior is to count the rows for each x value. It doesn't expect a y-value, since it's going to count that up itself -- in fact, it will flag a warning if you give it one, since it thinks you're confused.

What does Geom_col do in R?

geom_col makes the height of the bar from the values in dataset.

What is Geom_bar?

geom_bar() makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights).


Video Answer


1 Answers

You might be misunderstanding the stat option for geom_bar. In this case, since you want the values for each factor to be summed up within each bar, and the bars to be colored based off how much of that total sum is in each color, you can simplify the call to geom_col which uses the values as heights for the bar; and therefore "sums" all the values within each category. For example, the following will give the desired output:

p1 <- ggplot(diamonds,aes(cut,price,fill=color)) + 
        geom_col(na.rm=TRUE)

Alternatively, if you want to use geom_bar with a stat call, then you want to use the "identity" stat:

p1 <- ggplot(diamonds,aes(cut,price,fill=color)) + 
        geom_bar(stat = "identity", na.rm=TRUE)

For more information, consider this thread: https://stackoverflow.com/a/27965637/6722506

like image 92
creutzml Avatar answered Oct 08 '22 00:10

creutzml