I want to plot a bar chart summing a variable along two dimensions, one will be spread along x
, and the other will be spread vertically (stacked).
I would expect the two following instructions to do the same, but they don't and only the 2nd one gives the desired output (where I aggregate the data myself).
I'd like to understand what's going on in the first case, and if there's a way to use ggplot2
's built-in aggregation features to get the right output.
library(ggplot2)
library(dplyr)
p1 <- ggplot(diamonds,aes(cut,price,fill=color)) +
geom_bar(stat="sum",na.rm=TRUE)
yielding this plot:
p2 <- ggplot(diamonds %>%
group_by(cut,color) %>%
summarize_at("price",sum,na.rm=T),
aes(cut,price,fill=color)) +
geom_bar(stat="identity",na.rm=TRUE)
yielding this picture:
Here's where the top of our bars should be, p1 doesn't give these values:
diamonds %>% group_by(cut) %>% summarize_at("price",sum,na.rm=TRUE)
# # A tibble: 5 x 2
# cut price
# <ord> <int>
# 1 Fair 7017600
# 2 Good 19275009
# 3 Very Good 48107623
# 4 Premium 63221498
# 5 Ideal 74513487
By default, geom_bar uses stat="count" which makes the height of the bar proportion to the number of cases in each group (or if the weight aethetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use stat="identity" and map a variable to the y aesthetic.
The key difference is how they aggregate the data by default. For geom_bar() , the default behavior is to count the rows for each x value. It doesn't expect a y-value, since it's going to count that up itself -- in fact, it will flag a warning if you give it one, since it thinks you're confused.
geom_col makes the height of the bar from the values in dataset.
geom_bar() makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights).
You might be misunderstanding the stat
option for geom_bar
. In this case, since you want the values for each factor to be summed up within each bar, and the bars to be colored based off how much of that total sum is in each color, you can simplify the call to geom_col
which uses the values as heights for the bar; and therefore "sums" all the values within each category. For example, the following will give the desired output:
p1 <- ggplot(diamonds,aes(cut,price,fill=color)) +
geom_col(na.rm=TRUE)
Alternatively, if you want to use geom_bar
with a stat call, then you want to use the "identity" stat:
p1 <- ggplot(diamonds,aes(cut,price,fill=color)) +
geom_bar(stat = "identity", na.rm=TRUE)
For more information, consider this thread: https://stackoverflow.com/a/27965637/6722506
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With