Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: aggregate by all factor levels (present and not present)

Tags:

r

aggregate

dplyr

I can aggregate a data.frame trivially with dplyr with the following:

z <- data.frame(a = rnorm(20), b = rep(letters[1:4], each = 5))

library(dplyr)

z %>%
  group_by(b) %>%
  summarise(out = n())

Source: local data frame [4 x 2]

       b   out
  (fctr) (int)
1      a     5
2      b     5
3      c     5
4      d     5

However, sometimes a dataset may be missing a factor. In which case I would like the output to be 0.

For example, let's say the typical dataset should have 5 groups.

z$b <- factor(z$b, levels = letters[1:5])

But clearly there aren't any in this particular but could be in another. How can I aggregate this data so the length for missing factors is 0.

Desired output:

Source: local data frame [4 x 2]

       b   out
  (fctr) (int)
1      a     5
2      b     5
3      c     5
4      d     5
5      e     0
like image 893
cdeterman Avatar asked Jan 23 '26 16:01

cdeterman


1 Answers

One way to approach this is to use complete from "tidyr". You have to use mutate first to factor column "b":

library(dplyr)
library(tidyr)

z %>%
  mutate(b = factor(b, letters[1:5])) %>%
  group_by(b) %>%
  summarise(out = n()) %>%
  complete(b, fill = list(out = 0))
# Source: local data frame [5 x 2]
# 
#        b   out
#   (fctr) (dbl)
# 1      a     5
# 2      b     5
# 3      c     5
# 4      d     5
# 5      e     0
like image 87
A5C1D2H2I1M1N2O1R2T1 Avatar answered Jan 25 '26 07:01

A5C1D2H2I1M1N2O1R2T1