I'm trying to write a function that takes in the name of a data frame and a column to summarize by using dplyr, then returns the summarized data frame. I've tried a bunch of permutations of interp() from the lazyeval package, but I've spent way too much time trying to get it to work. So, I wrote a "static" version of the function I want here: <pre class="prettyprint"><code>summarize.df.static <- function(){ temp_df <- mtcars %>% group_by(cyl) %>% summarize(qsec = mean(qsec), mpg=mean(mpg)) return(temp_df) } new_df <- summarize.df.static() head(new_df) </code></pre> Here is the start of the dynamic version I'm stuck on: <pre class="prettyprint"><code>summarize.df.dynamic <- function(df_in,sum_metric_in){ temp_df <- df_in %>% group_by(cyl) %>% summarize_(qsec = mean(qsec), sum_metric_in=mean(sum_metric_in)) # some mix of interp() return(temp_df) } new_df <- summarize.df.dynamic(mtcars,"mpg") head(new_df) </code></pre> Note that I want the column name in this example to come from the parameter passed-in as well (mpg in this case). Also note that the qsec column is static, ie not passed-in. Below is the correct answer posted by "docendo discimus": <pre class="prettyprint"><code>summarize.df.dynamic<- function(df_in, sum_metric_in){ temp_df <- df_in %>% group_by(cyl) %>% summarize_(qsec = ~mean(qsec), xyz = interp(~mean(var), var = as.name(sum_metric_in))) names(temp_df)[names(temp_df) == "xyz"] <- sum_metric_in return(temp_df) } new_df <- summarize.df.dynamic(mtcars,"mpg") head(new_df) # cyl qsec mpg #1 4 19.13727 26.66364 #2 6 17.97714 19.74286 #3 8 16.77214 15.10000 new_df <- summarize.df.dynamic(mtcars,"disp") head(new_df) # cyl qsec disp #1 4 19.13727 105.1364 #2 6 17.97714 183.3143 #3 8 16.77214 353.1000 </code></pre>

For the specific example (with static "qsec" etc) you could do: <pre class="prettyprint"><code>library(dplyr) library(lazyeval) summarize.df <- function(data, sum_metric_in){ data <- data %>% group_by(cyl) %>% summarize_(qsec = ~mean(qsec), xyz = interp(~mean(var), var = as.name(sum_metric_in))) names(data)[names(data) == "xyz"] <- sum_metric_in data } summarize.df(mtcars, "mpg") #Source: local data frame [3 x 3] # # cyl qsec mpg #1 4 19.13727 26.66364 #2 6 17.97714 19.74286 #3 8 16.77214 15.10000 </code></pre> AFAIK you cannot (yet?) supply the input "sum_metric_in" to dplyr::rename which you would typically use to rename the column, which is why I did it different in the example.

You could use <code>paste</code> or <code>~</code> to get a quote input that <code>summarize_</code> understands. <pre class="prettyprint"><code>df_in %>% group_by(cyl) %>% summarize_(qsec = ~mean(qsec), sum_metric_in=paste0('mean(', sum_metric_in, ')')) </code></pre>

dplyr and Non-standard evaluation (NSE)

Tags:

r

dplyr

I'm trying to write a function that takes in the name of a data frame and a column to summarize by using dplyr, then returns the summarized data frame. I've tried a bunch of permutations of interp() from the lazyeval package, but I've spent way too much time trying to get it to work. So, I wrote a "static" version of the function I want here:

summarize.df.static <- function(){
  temp_df <- mtcars %>%
    group_by(cyl) %>%
    summarize(qsec = mean(qsec),
              mpg=mean(mpg))
  return(temp_df)
}

new_df <- summarize.df.static()
head(new_df)

Here is the start of the dynamic version I'm stuck on:

summarize.df.dynamic <- function(df_in,sum_metric_in){
  temp_df <- df_in %>%
    group_by(cyl) %>%
    summarize_(qsec = mean(qsec),
              sum_metric_in=mean(sum_metric_in)) # some mix of interp()
  return(temp_df)
}

new_df <- summarize.df.dynamic(mtcars,"mpg")
head(new_df)

Note that I want the column name in this example to come from the parameter passed-in as well (mpg in this case). Also note that the qsec column is static, ie not passed-in.

Below is the correct answer posted by "docendo discimus":

summarize.df.dynamic<- function(df_in, sum_metric_in){
  temp_df <- df_in %>%
    group_by(cyl) %>%
    summarize_(qsec = ~mean(qsec), 
               xyz = interp(~mean(var), var = as.name(sum_metric_in))) 

  names(temp_df)[names(temp_df) == "xyz"] <- sum_metric_in  
  return(temp_df)
}

new_df <- summarize.df.dynamic(mtcars,"mpg")
head(new_df)

#  cyl     qsec      mpg
#1   4 19.13727 26.66364
#2   6 17.97714 19.74286
#3   8 16.77214 15.10000

new_df <- summarize.df.dynamic(mtcars,"disp")
head(new_df)

#  cyl     qsec     disp
#1   4 19.13727 105.1364
#2   6 17.97714 183.3143
#3   8 16.77214 353.1000

576

asked Jan 14 '15 14:01

Tyler Muth

2 Answers

For the specific example (with static "qsec" etc) you could do:

library(dplyr)
library(lazyeval)
summarize.df <- function(data, sum_metric_in){
  data <- data %>%
    group_by(cyl) %>%
    summarize_(qsec = ~mean(qsec), 
               xyz = interp(~mean(var), var = as.name(sum_metric_in))) 

  names(data)[names(data) == "xyz"] <- sum_metric_in  
  data
}

summarize.df(mtcars, "mpg")
#Source: local data frame [3 x 3]
#
#  cyl     qsec      mpg
#1   4 19.13727 26.66364
#2   6 17.97714 19.74286
#3   8 16.77214 15.10000

AFAIK you cannot (yet?) supply the input "sum_metric_in" to dplyr::rename which you would typically use to rename the column, which is why I did it different in the example.

138

answered Sep 27 '22 22:09

talat

You could use paste or ~ to get a quote input that summarize_ understands.

df_in %>%
  group_by(cyl) %>%
  summarize_(qsec = ~mean(qsec),
             sum_metric_in=paste0('mean(', sum_metric_in, ')'))

answered Sep 27 '22 22:09

shadow

Related questions
                            
                                "invalid argument type" error with all.equal. R
                            
                                Equivalent of boxplot lwd parameter for bwplot
                            
                                How to connect points of different groups by a line using ggplot
                            
                                In R, can I make the table() function return the number of NA values in a named element?
                            
                                How to convert multiple columns to individual rows in R
                            
                                How to sum values of array in each dimension into one matrix
                            
                                R - svd() function - infinite or missing values in 'x'
                            
                                Error in read.table: !header: invalid argument type
                            
                                Getting observations corresponding to each quartile
                            
                                Reading in multiple png files in order to create a new plot with grid.arrange
                            
                                User Defined Metric in Caret Package
                            
                                User supplied arguments for ordering a data.frame using arrange
                            
                                How to extract the non-empty elements of list in R?
                            
                                How can I have darker gridlines for theme_bw() in ggplot2?
                            
                                For loop for forecasting several datasets at once in R
                            
                                Function generation; change defaults of other functions (partial)
                            
                                Pasting elements of two vectors alphabetically
                            
                                How can I specify which shiny account to use when deploying?
                            
                                Converting R file to Stata with missing string values
                            
                                how to track progress in mclapply in R in parallel package

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With