From this question we see a simple geom_line
in the answer.
library(dplyr)
BactData %>% filter(year(Date) == 2017) %>%
ggplot(aes(Date, Svartediket_CB )) + geom_line()
If we change geom_line
to geom_bar
we may expect to see a bar plot, but instead
Error: stat_count() must not be used with a y aesthetic.
But it works if we add stat = "identity"
, like so
library(dplyr)
BactData %>% filter(year(Date) == 2017) %>%
ggplot(aes(Date, Svartediket_CB )) + geom_bar(stat = "identity")
Why doesn't geom_bar
work without stat = "identity"
- i.e. what is the purpose of stat = "identity"
?
If it is stat = "identity" , we are asking R to use the y-value we provide for the dependent variable. If we specify stat = "count" or leave geom_bar() blank, R will count the number of observations based on the x-variable groupings.
simply use stat = "summary" and fun.y = "mean" ggplot(test2) + geom_bar(aes(label, X2, fill = as.factor(groups)), position = "dodge", stat = "summary", fun.y = "mean")
geom_col makes the height of the bar from the values in dataset.
color. The color parameter modifies the color of the border of the bars.
There are two layers that are closely related: geom_bar()
and geom_col()
. The key difference is how they aggregate the data by default.
For geom_bar()
, the default behavior is to count the rows for each x value. It doesn't expect a y-value, since it's going to count that up itself -- in fact, it will flag a warning if you give it one, since it thinks you're confused. How aggregation is to be performed is specified as an argument to geom_bar()
, which is stat = "count"
for the default value.
If you explicitly say stat = "identity"
in geom_bar()
, you're telling ggplot2
to skip the aggregation and that you'll provide the y values. This mirrors the natural behavior of geom_col()
below.
In the case of geom_col()
, it won't try to aggregate the data by default. From the docs, "geom_col()
uses stat_identity()
: it leaves the data as is". So, it expects you to already have the y values calculated and to use them directly. And geom_col()
doesn't have an argument to change that behavior - it's always going to plot your y values that you provide, and you need to provide them.
If you have y values, you could use either syntax, but I find geom_col()
more direct.
@Stevec.
I found the answer at rdocumentation.org.
See below what means stat='identity':
"The heights of the bars commonly represent one of two things: either a count of cases in each group, or the values in a column of the data frame. By default, geom_bar uses stat="bin". This makes the height of each bar equal to the number of cases in each group, and it is incompatible with mapping values to the y aesthetic. If you want the heights of the bars to represent values in the data, use stat="identity" and map a value to the y aesthetic."
Hope this was helpful.
Follow the link to documentation: geom_bar documentation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With