Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: Use a custom function in summarize() after group_by()

Tags:

r

dplyr

How can we use a custom function after group_by()? I checked similar posts (1, 2, and 3), but my current code returns the same values for all groups.

> data
   village     A     Z      Y 
     <chr> <int> <int>   <dbl> 
 1       a     1     1   500     
 2       a     1     1   400     
 3       a     1     0   800  
 4       b     1     0   300  
 5       b     1     1   700  

z <- 1
data %>%
    group_by(village) %>%
    summarize(Y_village = Y_hat_village(., z))

Y_hat_village <- function(data_village, z){
    # Calculate the mean for a specific z in a village
    data_z <- data_village %>% filter(Z==get("z"))
    return(mean(data_z$Y))
}

I want to have (500 + 400)/2 = 450 for village "a" and 700 for village "b".

like image 747
user2978524 Avatar asked Jun 19 '18 10:06

user2978524


1 Answers

It's easier to understand if you start by writing it without an extra function. In that case it would be:

df %>%
  group_by(village) %>%
  summarize(Y_village = mean(Y[Z == z]))

## A tibble: 2 x 2
#  village Y_village
#  <fct>       <dbl>
#1 a            450.
#2 b            700.

Hence, your function should be something like

Y_hat_village <- function(Ycol, Zcol, z){
  mean(Ycol[Zcol == z])
}

And then using it:

df %>%
  group_by(village) %>%
  summarize(Y_village = Y_hat_village(Y, Z, z))

Note that the function I wrote only deals with atomic vectors which you can supply directly from within summarise. You don't need to supply the whole data.frame into it.

like image 105
talat Avatar answered Oct 26 '22 23:10

talat