Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

summarizing counts of a factor with dplyr

Tags:

I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation. The real data frame is fairly large, and there are 10 different factors.

Here is some example input:

library(dplyr) df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))    owner  obs1  obs2 1     0 quiet  loud 2     0  loud  loud 3     1 quiet quiet 4     1  loud quiet 

I was looking for output that looks like this:

out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))    owner observation quiet loud 1     0        obs1     1    1 2     0        obs2     0    2 3     1        obs1     1    1 4     1        obs2     2    0 

Melting gets me partway there:

melted = tbl_df(melt(df, id=c("owner")))    owner variable value 1     0     obs1 quiet 2     0     obs1  loud 3     1     obs1 quiet 4     1     obs1  loud 5     0     obs2  loud 6     0     obs2  loud 7     1     obs2 quiet 8     1     obs2 quiet 

But what's the last step? If 'value' was a numeric, I'd just go:

melted %>% group_by(owner, variable) %>% summarise(counts=sum(value)) 

Thanks so much!

like image 262
Rory Kirchner Avatar asked Sep 12 '14 15:09

Rory Kirchner


2 Answers

You could use tidyr with dplyr

library(dplyr) library(tidyr)   df %>%  gather(observation, Val, obs1:obs2) %>%   group_by(owner,observation, Val) %>%   summarise(n= n()) %>%  ungroup() %>%  spread(Val, n, fill=0) 

which gives the output

  #    owner observation loud quiet   #1     0        obs1    1     1   #2     0        obs2    2     0   #3     1        obs1    1     1   #4     1        obs2    0     2 
like image 116
akrun Avatar answered Sep 26 '22 03:09

akrun


In 2017 the answer is

library(dplyr) library(tidyr)  gather(df, key, value, -owner) %>%   group_by(owner, key, value) %>%   tally %>%    spread(value, n, fill = 0) 

Which gives output

Source: local data frame [4 x 4] Groups: owner, key [4]    owner   key  loud quiet * <dbl> <chr> <dbl> <dbl> 1     0  obs1     1     1 2     0  obs2     2     0 3     1  obs1     1     1 4     1  obs2     0     2 

In 2019 the answer is:

gather(df, key, value, -owner) %>%      count(owner, key, value) %>%      spread(value, n, fill = 0) 
like image 32
Sam Clifford Avatar answered Sep 22 '22 03:09

Sam Clifford