Let's use the mpg
dataset as an example, specifically the class
and cyl
columns. I can see how many entries are there, per single class
, and differentiate the fill color based on the cyl value:
library(ggplot2)
p <- ggplot(mpg)
p <- p + geom_bar(mapping=aes(x=class, fill=factor(cyl)), position=position_dodge())
print(p)
What I'd like to see, though, is the average count of entries (per class
), per different values of cyl
. Basically, if you look at the plot above, I want a single bar per class, whose height should be the average height of the colored bars for that class.
I am able to get this result by preprocessing the data frame, e.g.:
df <- aggregate(formula=cyl~class, data=mpg, FUN=function(x) { length(x) / length(unique(x)) })
p <- ggplot(df)
p <- p + geom_bar(mapping=aes(x=class, y=cyl), stat='identity')
p <- p + ylab('average count')
That gives my desired output:
However, given how powerful ggplot2 is, I am wondering if this is possible through ggplot functions. I guess this involves using a specific stat
(maybe with group=cyl
?), but I am not able to.
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .
Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data. frame d.f by applying a function specified by the FUN parameter to each column of sub-data. frames defined by the by input parameter. The by parameter has to be a list .
aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum.
ggplot2 allows you to do data manipulation, such as filtering or slicing, within the data argument.
We can plug your formula straight into stat_summary()
to generate the desired result without intermediate steps:
library(ggplot2)
ggplot(mpg) +
stat_summary(aes(x = class, y = cyl),
fun.y = function(x) length(x) / length(unique(x)),
geom = "bar")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With