Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr and Non-standard evaluation (NSE)

Tags:

r

dplyr

I'm trying to write a function that takes in the name of a data frame and a column to summarize by using dplyr, then returns the summarized data frame. I've tried a bunch of permutations of interp() from the lazyeval package, but I've spent way too much time trying to get it to work. So, I wrote a "static" version of the function I want here:

summarize.df.static <- function(){
  temp_df <- mtcars %>%
    group_by(cyl) %>%
    summarize(qsec = mean(qsec),
              mpg=mean(mpg))
  return(temp_df)
}

new_df <- summarize.df.static()
head(new_df)

Here is the start of the dynamic version I'm stuck on:

summarize.df.dynamic <- function(df_in,sum_metric_in){
  temp_df <- df_in %>%
    group_by(cyl) %>%
    summarize_(qsec = mean(qsec),
              sum_metric_in=mean(sum_metric_in)) # some mix of interp()
  return(temp_df)
}

new_df <- summarize.df.dynamic(mtcars,"mpg")
head(new_df)

Note that I want the column name in this example to come from the parameter passed-in as well (mpg in this case). Also note that the qsec column is static, ie not passed-in.

Below is the correct answer posted by "docendo discimus":

summarize.df.dynamic<- function(df_in, sum_metric_in){
  temp_df <- df_in %>%
    group_by(cyl) %>%
    summarize_(qsec = ~mean(qsec), 
               xyz = interp(~mean(var), var = as.name(sum_metric_in))) 

  names(temp_df)[names(temp_df) == "xyz"] <- sum_metric_in  
  return(temp_df)
}

new_df <- summarize.df.dynamic(mtcars,"mpg")
head(new_df)

#  cyl     qsec      mpg
#1   4 19.13727 26.66364
#2   6 17.97714 19.74286
#3   8 16.77214 15.10000

new_df <- summarize.df.dynamic(mtcars,"disp")
head(new_df)

#  cyl     qsec     disp
#1   4 19.13727 105.1364
#2   6 17.97714 183.3143
#3   8 16.77214 353.1000
like image 576
Tyler Muth Avatar asked Jan 14 '15 14:01

Tyler Muth


People also ask

What is non standard evaluation in R?

Non-standard evaluation shows you how subset() works by combining substitute() with eval() to allow you to succinctly select rows from a data frame. Scoping issues discusses scoping issues specific to NSE, and will show you how to resolve them.

What is dplyr used for?

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges: mutate() adds new variables that are functions of existing variables. select() picks variables based on their names. filter() picks cases based on their values.

Why do we use dplyr in R?

The dplyr package makes these steps fast and easy: By constraining your options, it helps you think about your data manipulation challenges. It provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to help you translate your thoughts into code.

Is dplyr part of tidyverse?

Similarly to readr , dplyr and tidyr are also part of the tidyverse. These packages were loaded in R's memory when we called library(tidyverse) earlier.


2 Answers

For the specific example (with static "qsec" etc) you could do:

library(dplyr)
library(lazyeval)
summarize.df <- function(data, sum_metric_in){
  data <- data %>%
    group_by(cyl) %>%
    summarize_(qsec = ~mean(qsec), 
               xyz = interp(~mean(var), var = as.name(sum_metric_in))) 

  names(data)[names(data) == "xyz"] <- sum_metric_in  
  data
}

summarize.df(mtcars, "mpg")
#Source: local data frame [3 x 3]
#
#  cyl     qsec      mpg
#1   4 19.13727 26.66364
#2   6 17.97714 19.74286
#3   8 16.77214 15.10000

AFAIK you cannot (yet?) supply the input "sum_metric_in" to dplyr::rename which you would typically use to rename the column, which is why I did it different in the example.

like image 138
talat Avatar answered Sep 27 '22 22:09

talat


You could use paste or ~ to get a quote input that summarize_ understands.

df_in %>%
  group_by(cyl) %>%
  summarize_(qsec = ~mean(qsec),
             sum_metric_in=paste0('mean(', sum_metric_in, ')'))
like image 44
shadow Avatar answered Sep 27 '22 22:09

shadow