Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr within a function, non-standard evaluation

Tags:

r

dplyr

nse

Trying to get my head around Non-Standard Evaluation as used by dplyr but without success. I'd like a short function that returns summary statistics (N, mean, sd, median, IQR, min, max) for a specified set of variables.

Simplified version of my function...

my_summarise <- function(df = temp,
                         to.sum = 'eg1',
                         ...){
    ## Summarise
    results <- summarise_(df,
                          n = ~n(),
                          mean = mean(~to.sum, na.rm = TRUE))
    return(results)
}

And running it with some dummy data...

set.seed(43290)
temp <- cbind(rnorm(n = 100, mean = 2, sd = 4),
              rnorm(n = 100, mean = 3, sd = 6)) %>% as.data.frame()
names(temp) <- c('eg1', 'eg2')
mean(temp$eg1)
  [1] 1.881721
mean(temp$eg2)
  [1] 3.575819
my_summarise(df = temp, to.sum = 'eg1')
    n mean
1 100   NA

N is calculated, but the mean is not, can't figure out why.

Ultimately I'd like my function to be more general, along the lines of...

my_summarise <- function(df = temp,
                         group.by = 'group'
                         to.sum = c('eg1', 'eg2'),
                         ...){
    results <- list()
    ## Select columns
    df <- dplyr::select_(df, .dots = c(group.by, to.sum))
    ## Summarise overall
    results$all <- summarise_each(df,
                                  funs(n = ~n(),
                                       mean = mean(~to.sum, na.rm = TRUE)))
    ## Summarise by specified group
    results$by.group <- group_by_(df, ~to.group) %>%
                        summarise_each(df,
                                       funs(n = ~n(),
                                       mean = mean(~to.sum, na.rm = TRUE)))        
    return(results)
}

...but before I move onto this more complex version (which I was using this example for guidance) I need to get the evaluation working in the simple version first as thats the stumbling block, the call to dplyr::select() works ok.

Appreciate any advice as to where I'm going wrong.

Thanks in advance

like image 569
slackline Avatar asked Oct 13 '16 09:10

slackline


People also ask

Can you use dplyr in a function?

As with any R function, you can think of functions in the dplyr package as verbs - that refer to performing a particular action on a data frame. The core dplyr functions are: rename() renames columns. filter() filters rows based on their values in specified columns.

Which function uses non-standard evaluation so that you can directly use the columns of the data frame without typing the name of the data frame many times?

Metaprogramming. The final use of non-standard evaluation is to do metaprogramming.

Which are 5 of the most commonly used dplyr functions?

This article will cover the five verbs of dplyr: select, filter, arrange, mutate, and summarize.

What is non-standard evaluation in R?

Non-standard evaluation shows you how subset() works by combining substitute() with eval() to allow you to succinctly select rows from a data frame. Scoping issues discusses scoping issues specific to NSE, and will show you how to resolve them.


1 Answers

The basic idea is that you have to actually build the appropriate call yourself, most easily done with the lazyeval package.

In this case you want to programmatically create a call that looks like ~mean(eg1, na.rm = TRUE). This is how:

my_summarise <- function(df = temp,
                         to.sum = 'eg1',
                         ...){
  ## Summarise
  results <- summarise_(df,
                        n = ~n(),
                        mean = lazyeval::interp(~mean(x, na.rm = TRUE),
                                                x = as.name(to.sum)))
  return(results)
}

Here is what I do when I struggle to get things working:

  1. Remember that, just like the ~n() you already have, the call will have to start with a ~.
  2. Write the correct call with the actual variable and see if it works (~mean(eg1, na.rm = TRUE)).
  3. Use lazyeval::interp to recreate that call, and check this by running only the interp to visually see what it is doing.

In this case I would probably often write interp(~mean(x, na.rm = TRUE), x = to.sum). But running that will give us ~mean("eg1", na.rm = TRUE) which is treating eg1 as a character instead of a variable name. So we use as.name, as is taught to us in vignette("nse").

like image 88
Axeman Avatar answered Sep 20 '22 04:09

Axeman