How can we use a custom function after group_by()?
I checked similar posts (1, 2, and 3), but my current code returns the same values for all groups.
> data
village A Z Y
<chr> <int> <int> <dbl>
1 a 1 1 500
2 a 1 1 400
3 a 1 0 800
4 b 1 0 300
5 b 1 1 700
z <- 1
data %>%
group_by(village) %>%
summarize(Y_village = Y_hat_village(., z))
Y_hat_village <- function(data_village, z){
# Calculate the mean for a specific z in a village
data_z <- data_village %>% filter(Z==get("z"))
return(mean(data_z$Y))
}
I want to have (500 + 400)/2 = 450 for village "a" and 700 for village "b".
It's easier to understand if you start by writing it without an extra function. In that case it would be:
df %>%
group_by(village) %>%
summarize(Y_village = mean(Y[Z == z]))
## A tibble: 2 x 2
# village Y_village
# <fct> <dbl>
#1 a 450.
#2 b 700.
Hence, your function should be something like
Y_hat_village <- function(Ycol, Zcol, z){
mean(Ycol[Zcol == z])
}
And then using it:
df %>%
group_by(village) %>%
summarize(Y_village = Y_hat_village(Y, Z, z))
Note that the function I wrote only deals with atomic vectors which you can supply directly from within summarise. You don't need to supply the whole data.frame into it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With