Multiple aggregations (categorical and numeric) with dplyr in one chain

Question

I had a problem today figuring out a way to do an aggregation in dplyr in R but for some reason was unable to come up with a solution (although I think this should be quite easy).

I have a data set like this:

structure(list(date = structure(c(16431, 16431, 16431, 16432, 
16432, 16432, 16433, 16433, 16433), class = "Date"), colour = structure(c(3L, 
1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L), .Label = c("blue", "green", 
"red"), class = "factor"), shape = structure(c(2L, 2L, 3L, 3L, 
3L, 2L, 1L, 1L, 1L), .Label = c("circle", "square", "triangle"
), class = "factor"), value = c(100, 130, 100, 180, 125, 190, 
120, 100, 140)), .Names = c("date", "colour", "shape", "value"
), row.names = c(NA, -9L), class = "data.frame")

which shows like this:

        date colour    shape value
1 2014-12-27    red   square   100
2 2014-12-27   blue   square   130
3 2014-12-27   blue triangle   100
4 2014-12-28  green triangle   180
5 2014-12-28  green triangle   125
6 2014-12-28    red   square   190
7 2014-12-29    red   circle   120
8 2014-12-29   blue   circle   100
9 2014-12-29   blue   circle   140

My goal is to calculate the most frequent colour, shape and the mean value per day. My expected output is the following:

        date colour    shape value
1 27/12/2014   blue   square   110
2 28/12/2014  green triangle   165
3 29/12/2014   blue   circle   120

I ended up doing it using split and writing my own function to calculate the above for a data.frame, then used snow::clusterApply to run it in parallel. It was efficient enough (my original dataset is about 10M rows long) but I am wondering whether this can happen in one chain using dplyr. Efficiency is really important for this so being able to run it in one chain is quite important.

David Robinson · Accepted Answer

You could do

dat %>% group_by(date) %>%
    summarize(colour = names(which.max(table(colour))),
              shape = names(which.max(table(shape))),
              value = mean(value))

Multiple aggregations (categorical and numeric) with dplyr in one chain

Tags:

r

dplyr

LyzandeR

1 Answers

David Robinson

Recent Activity

Donate For Us

Multiple aggregations (categorical and numeric) with dplyr in one chain

Tags:

r

dplyr

LyzandeR

1 Answers

David Robinson

Related questions

Recent Activity

Donate For Us