(The following scenario simplifies my actual situation) My data comes from villages, and I would like to summarize an outcome variable by a village variable. <pre class="prettyprint"><code>> data village A Z Y <chr> <int> <int> <dbl> 1 a 1 1 500 2 a 1 1 400 3 a 1 0 800 4 b 1 0 300 5 b 1 1 700 </code></pre> For example, I would like to calculate the mean of <code>Y</code> only using <code>Z==z</code> by villages. In this case, I want to have (500 + 400)/2 = 450 for village "a" and 700 for village "b". Please note that the actual situation is more complicated and I cannot directly use this answer, but the point is I need to pass a grouped tibble and a global variable (z) to my function. <pre class="prettyprint"><code>z <- 1 # z takes 0 or 1 data %>% group_by(village) %>% # grouping by village summarize(Y_village = Y_hat_village(., z)) # pass a part of tibble and a global variable Y_hat_village <- function(data_village, z){ # This function takes a part of tibble (`data_village`) and a variable `z` # Calculate the mean for a specific z in a village data_z <- data_village %>% filter(Z==get("z")) return(mean(data_z$Y)) } </code></pre> However, I found <code>.</code> passes entire tibble and the code above returns the same values for all groups.

There are a couple things you can simplify. One is in your function: since you're passing in a value <code>z</code> to the function, you don't need to use <code>get("z")</code>. You have a <code>z</code> in the global environment that you pass in; or, more safely, assign your z value to a variable with some other name so you don't run into scoping issues, and pass that in to the function. In this case, I'm calling it <code>z_val</code>. <pre class="prettyprint lang-r prettyprint-override"><code>library(tidyverse) z_val <- 1 Y_hat_village2 <- function(data, z) { data_z <- data %>% filter(Z == z) return(mean(data_z$Y)) } </code></pre> You can make the function call on each group using <code>do</code>, which will get you a list-column, and then unnesting that column. Again note that I'm passing in the variable <code>z_val</code> to the argument <code>z</code>. <pre class="prettyprint lang-r prettyprint-override"><code>df %>% group_by(village) %>% do(y_hat = Y_hat_village2(., z = z_val)) %>% unnest() #> # A tibble: 2 x 2 #> village y_hat #> <chr> <dbl> #> 1 a 450 #> 2 b 700 </code></pre> However, <code>do</code> is being deprecated in favor of <code>purrr::map</code>, which I am still having trouble getting the hang of. In this case, you can group and nest, which gives a column of data frames called <code>data</code>, then map over that column and again supply <code>z = z_val</code>. When you unnest the <code>y_hat</code> column, you still have the original data as a nested column, since you wanted access to the rest of the columns still. <pre class="prettyprint lang-r prettyprint-override"><code>df %>% group_by(village) %>% nest() %>% mutate(y_hat = map(data, ~Y_hat_village2(., z = z_val))) %>% unnest(y_hat) #> # A tibble: 2 x 3 #> village data y_hat #> <chr> <list> <dbl> #> 1 a <tibble [3 × 3]> 450 #> 2 b <tibble [2 × 3]> 700 </code></pre> Just to check that everything works okay, I also passed in <code>z = 0</code> to check for 1. scoping issues, and 2. that other values of z work. <pre class="prettyprint lang-r prettyprint-override"><code>df %>% group_by(village) %>% nest() %>% mutate(y_hat = map(data, ~Y_hat_village2(., z = 0))) %>% unnest(y_hat) #> # A tibble: 2 x 3 #> village data y_hat #> <chr> <list> <dbl> #> 1 a <tibble [3 × 3]> 800 #> 2 b <tibble [2 × 3]> 300 </code></pre>

As an extension/modification to @patL's answer, you can also wrap the <code>tidyverse</code> solution within <code>purrr:map</code> to return a <code>list</code> of two <code>tibble</code>s, one for each <code>z</code> value: <pre class="prettyprint"><code>z <- c(0, 1); map(z, ~df %>% filter(Z == .x) %>% group_by(village) %>% summarise(Y.mean = mean(Y))) #[[1]] ## A tibble: 2 x 2 # village Y.mean # <fct> <dbl> #1 a 800. #2 b 300. # #[[2]] ## A tibble: 2 x 2 # village Y.mean # <fct> <dbl> #1 a 450. #2 b 700. </code></pre> <hr> <h3>Sample data</h3> <pre class="prettyprint"><code>df <- read.table(text = " village A Z Y 1 a 1 1 500 2 a 1 1 400 3 a 1 0 800 4 b 1 0 300 5 b 1 1 700 ", header = T) </code></pre>

dplyr: passing a grouped tibble to a custom function

Tags:

r

dplyr

(The following scenario simplifies my actual situation)
My data comes from villages, and I would like to summarize an outcome variable by a village variable.

> data
   village     A     Z      Y 
     <chr> <int> <int>   <dbl> 
 1       a     1     1   500     
 2       a     1     1   400     
 3       a     1     0   800  
 4       b     1     0   300  
 5       b     1     1   700

For example, I would like to calculate the mean of Y only using Z==z by villages. In this case, I want to have (500 + 400)/2 = 450 for village "a" and 700 for village "b".

Please note that the actual situation is more complicated and I cannot directly use this answer, but the point is I need to pass a grouped tibble and a global variable (z) to my function.

z <- 1 # z takes 0 or 1
data %>%
    group_by(village) %>% # grouping by village
    summarize(Y_village = Y_hat_village(., z)) # pass a part of tibble and a global variable

Y_hat_village <- function(data_village, z){
    # This function takes a part of tibble (`data_village`) and a variable `z`
    # Calculate the mean for a specific z in a village
    data_z <- data_village %>% filter(Z==get("z"))
    return(mean(data_z$Y))
}

However, I found . passes entire tibble and the code above returns the same values for all groups.

575

asked Jun 19 '18 12:06

user2978524

2 Answers

There are a couple things you can simplify. One is in your function: since you're passing in a value z to the function, you don't need to use get("z"). You have a z in the global environment that you pass in; or, more safely, assign your z value to a variable with some other name so you don't run into scoping issues, and pass that in to the function. In this case, I'm calling it z_val.

library(tidyverse)

z_val <- 1

Y_hat_village2 <- function(data, z) {
  data_z <- data %>% filter(Z == z)
  return(mean(data_z$Y))
}

You can make the function call on each group using do, which will get you a list-column, and then unnesting that column. Again note that I'm passing in the variable z_val to the argument z.

df %>%
  group_by(village) %>%
  do(y_hat = Y_hat_village2(., z = z_val)) %>%
  unnest()
#> # A tibble: 2 x 2
#>   village y_hat
#>   <chr>   <dbl>
#> 1 a         450
#> 2 b         700

However, do is being deprecated in favor of purrr::map, which I am still having trouble getting the hang of. In this case, you can group and nest, which gives a column of data frames called data, then map over that column and again supply z = z_val. When you unnest the y_hat column, you still have the original data as a nested column, since you wanted access to the rest of the columns still.

df %>%
  group_by(village) %>%
  nest() %>%
  mutate(y_hat = map(data, ~Y_hat_village2(., z = z_val))) %>%
  unnest(y_hat)
#> # A tibble: 2 x 3
#>   village data             y_hat
#>   <chr>   <list>           <dbl>
#> 1 a       <tibble [3 × 3]>   450
#> 2 b       <tibble [2 × 3]>   700

Just to check that everything works okay, I also passed in z = 0 to check for 1. scoping issues, and 2. that other values of z work.

df %>%
  group_by(village) %>%
  nest() %>%
  mutate(y_hat = map(data, ~Y_hat_village2(., z = 0))) %>%
  unnest(y_hat)
#> # A tibble: 2 x 3
#>   village data             y_hat
#>   <chr>   <list>           <dbl>
#> 1 a       <tibble [3 × 3]>   800
#> 2 b       <tibble [2 × 3]>   300

answered Nov 01 '22 19:11

camille

As an extension/modification to @patL's answer, you can also wrap the tidyverse solution within purrr:map to return a list of two tibbles, one for each z value:

z <- c(0, 1);
map(z, ~df %>% filter(Z == .x) %>% group_by(village) %>% summarise(Y.mean = mean(Y)))
#[[1]]
## A tibble: 2 x 2
#  village Y.mean
#  <fct>    <dbl>
#1 a         800.
#2 b         300.
#
#[[2]]
## A tibble: 2 x 2
#  village Y.mean
#  <fct>    <dbl>
#1 a         450.
#2 b         700.

Sample data

df <- read.table(text =
    "  village     A     Z      Y
 1       a     1     1   500
 2       a     1     1   400
 3       a     1     0   800
 4       b     1     0   300
 5       b     1     1   700  ", header = T)

answered Nov 01 '22 21:11

Maurits Evers

Related questions
                            
                                List assignment for list with greater than three nesting
                            
                                How to Create Required Matrix Using Dataframe in R
                            
                                replace list element with another list element based on name
                            
                                How can I create a self-contained html report with rmarkdown?
                            
                                line break within cell for huxtable table
                            
                                load .yml file in R
                            
                                How to get raw code from a Jupyter notebook?
                            
                                Creating a new line within an RMarkdown chunk
                            
                                r ggplot2 How to make vjust in geom_text put label at bottom of bar
                            
                                Remove words in one column present in another column in R
                            
                                Multiple conditions for r data.table calculation
                            
                                Error: R cannot connect to MySQL
                            
                                R, change ggplot legend names with scale_linetype_manual
                            
                                R officer package: Add slide numbers that reflect current slide position
                            
                                texreg on panelmodel (plm) object; additional gof information
                            
                                R: counting distinct combinations found in a data frame where columns are interchangable
                            
                                Stream system() output to Shiny front-end (continuously)
                            
                                Merge rasters of different extents, sum overlapping cell values in R
                            
                                Script with utf-8 text runs differently from RStudio and command line in Windows
                            
                                Rotating histogram horizontally in r

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With