Would like to get a hand on dplyr code, but cannot figure this out. Have seen a similar issue described here for many variables (summarizing counts of a factor with dplyr and Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?), however my task is somewhat smaller.
Given a data frame, how do I count the frequency of a variable and place that in a new variable.
set.seed(9)
df <- data.frame(
group=c(rep(1,5), rep(2,5)),
var1=round(runif(10,1,3),0))
Then we have:
>df
group var1
1 1 1
2 1 1
3 1 1
4 1 1
5 1 2
6 2 1
7 2 2
8 2 2
9 2 2
10 2 3
Would like a third column indicating per-group (group
) how many times var1
occurs, in this example this would be: count=(4,4,4,4,1,1,3,3,3,1).
I tried - without success - things like:
df %>% group_by(group) %>% rowwise() %>% do(count = nrow(.$var1))
Explanations are very appreciated!
All you need to do is group your data by both columns, "group" and "var1":
df %>% group_by(group, var1) %>% mutate(count = n())
#Source: local data frame [10 x 3]
#Groups: group, var1
#
# group var1 count
#1 1 1 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 2 1
#6 2 1 1
#7 2 2 3
#8 2 2 3
#9 2 2 3
#10 2 3 1
Here's an example of how you SHOULD NOT DO IT:
df %>% group_by(group, var1) %>% do(data.frame(., count = length(.$group)))
The dplyr implementation with n()
is for sure much faster, cleaner and shorter and should always be preferred over such implementations as above.
Perhaps this is new functionality, but it can be done with one dplyr
command:
df %>% add_count(group, var1)
group var1 n
1 1 1 4
2 1 1 4
3 1 1 4
4 1 1 4
5 1 2 1
6 2 1 1
7 2 2 3
8 2 2 3
9 2 2 3
10 2 3 1
We may use probably another handy function tally
from dplyr
df %>% group_by(group, var1) %>% tally()
# Source: local data frame [5 x 3]
# Groups: group
#
# group var1 n
# 1 1 1 4
# 2 1 2 1
# 3 2 1 1
# 4 2 2 3
# 5 2 3 1
Two alternatives:
1: with base R:
# option 1:
df$count <- ave(df$var1, df$var1, df$group, FUN = length)
# option 2:
df <- transform(df, count = ave(var1, var1, group, FUN = length))
which gives:
> df group var1 count 1 1 1 4 2 1 1 4 3 1 1 4 4 1 1 4 5 1 2 1 6 2 1 1 7 2 2 3 8 2 2 3 9 2 2 3 10 2 3 1
2: with data.table:
library(data.table)
setDT(df)[, count := .N, by = .(group, var1)]
which gives the same result:
> df group var1 count 1: 1 1 4 2: 1 1 4 3: 1 1 4 4: 1 1 4 5: 1 2 1 6: 2 1 1 7: 2 2 3 8: 2 2 3 9: 2 2 3 10: 2 3 1
If you want to summarise, you can use:
# with base R:
aggregate(id ~ group + var1, transform(df, id = 1), length)
# with 'dplyr':
count(df, group, var1)
# with 'data.table':
setDT(df)[, .N, by = .(group, var1)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With