I can aggregate a data.frame trivially with dplyr with the following:
z <- data.frame(a = rnorm(20), b = rep(letters[1:4], each = 5))
library(dplyr)
z %>%
group_by(b) %>%
summarise(out = n())
Source: local data frame [4 x 2]
b out
(fctr) (int)
1 a 5
2 b 5
3 c 5
4 d 5
However, sometimes a dataset may be missing a factor. In which case I would like the output to be 0.
For example, let's say the typical dataset should have 5 groups.
z$b <- factor(z$b, levels = letters[1:5])
But clearly there aren't any in this particular but could be in another. How can I aggregate this data so the length for missing factors is 0.
Desired output:
Source: local data frame [4 x 2]
b out
(fctr) (int)
1 a 5
2 b 5
3 c 5
4 d 5
5 e 0
One way to approach this is to use complete from "tidyr". You have to use mutate first to factor column "b":
library(dplyr)
library(tidyr)
z %>%
mutate(b = factor(b, letters[1:5])) %>%
group_by(b) %>%
summarise(out = n()) %>%
complete(b, fill = list(out = 0))
# Source: local data frame [5 x 2]
#
# b out
# (fctr) (dbl)
# 1 a 5
# 2 b 5
# 3 c 5
# 4 d 5
# 5 e 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With