Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregating data with ggplot

Tags:

plot

r

ggplot2

Let's use the mpg dataset as an example, specifically the class and cyl columns. I can see how many entries are there, per single class, and differentiate the fill color based on the cyl value:

library(ggplot2)
p <- ggplot(mpg)
p <- p + geom_bar(mapping=aes(x=class, fill=factor(cyl)), position=position_dodge())
print(p)

enter image description here

What I'd like to see, though, is the average count of entries (per class), per different values of cyl. Basically, if you look at the plot above, I want a single bar per class, whose height should be the average height of the colored bars for that class.

I am able to get this result by preprocessing the data frame, e.g.:

df <- aggregate(formula=cyl~class, data=mpg, FUN=function(x) { length(x) / length(unique(x)) })
p <- ggplot(df)
p <- p + geom_bar(mapping=aes(x=class, y=cyl), stat='identity')
p <- p + ylab('average count')

That gives my desired output:

enter image description here

However, given how powerful ggplot2 is, I am wondering if this is possible through ggplot functions. I guess this involves using a specific stat (maybe with group=cyl?), but I am not able to.

like image 602
natario Avatar asked Jun 10 '16 13:06

natario


People also ask

How do you aggregate a dataset in R?

In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .

What does it mean to aggregate data in R?

Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data. frame d.f by applying a function specified by the FUN parameter to each column of sub-data. frames defined by the by input parameter. The by parameter has to be a list .

What is aggregate used for in R?

aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum.

Can you filter within Ggplot?

ggplot2 allows you to do data manipulation, such as filtering or slicing, within the data argument.


1 Answers

We can plug your formula straight into stat_summary() to generate the desired result without intermediate steps:

library(ggplot2)
ggplot(mpg) + 
  stat_summary(aes(x = class, y = cyl), 
               fun.y = function(x) length(x) / length(unique(x)), 
               geom = "bar")

enter image description here

like image 198
mtoto Avatar answered Oct 16 '22 16:10

mtoto