dplyr: standard evaluation for mutate with quoted variable names

Tags:

How would I go about using mutate (my presumption is that I am looking for standard evaluation in my case, and hence mutate_, but I am not entirely confident on this point) when using a function that accepts a list of variable names, such as this:

createSum = function(data, variableNames) {
  data %>% 
    mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                            var = as.name(paste(as.character(variableNames), collapse =","))))

}

Here is an MWE that strips the function to its core logic and demonstrates what I am trying to achieve:

library(dplyr)
library(lazyeval)

# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
  liSample = lapply(colNames, function(week) {
    sample = rnorm(sampleSize)
  })
  names(liSample) = as.character(colNames)
  return(tbl_df(data.frame(liSample, check.names = FALSE)))
}

# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
                     to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)

# test mutate on this table
dfTest %>% 
  mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                          var = as.name(paste(as.character(weekDates), collapse =","))))

Expected output here is what would be returned by:

rowSums(dfTest[, as.character(weekDates)])

286

asked May 07 '15 17:05

tchakravarty

2 Answers

I think this is what you're after

createSum = function(data, variableNames) {
    data %>% 
        mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)

where we just supply a character value rather than interp because you can't pass in a list of names as a single parameter to a function. Plus, sum() would do some undesired collapsing because operations are not performed rowwise, they are passed in columns of vectors at a time.

The other problem with this example is that you set check.names=FALSE in your data.frame which means that you've created column names that cannot be valid symbols. You can explicitly wrap your variable names in back-ticks if you like

createSum(dfTest , paste0("`", weekDates,"`"))

but in general it would be better not to use invalid names.

118

answered Oct 27 '22 21:10

MrFlick

I don't know if this is an "officially sanctioned" dplyr way, but this is a possibility:

weekDates = as.character(weekDates) # more convenient

dfTest %>% mutate(sumvar = Reduce(`+`, lapply(weekDates, get, .)))
#or
dfTest %>% mutate(sumvar = rowSums(as.data.frame(lapply(weekDates, get, .))))

This does carry potentially significant performance penalties, depending on your particular usage - in addition to dplyr's regular copying of the entire data I think it also copies it a second time during that internal computation. You can look into data.table to avoid the extra copying around by adding columns in place (and using .SDcols to avoid the second copy) + you'll get arguably better syntax.

answered Oct 27 '22 22:10

eddi

Related questions
                            
                                R shiny bi-directional reactive widgets
                            
                                How to pass a vector of ggplot objects to grid.arrange function?
                            
                                why does split coerce double to integer in R and is there a workaround
                            
                                Counting rows in data.table that meet a condition
                            
                                Draw a half circle with ggplot2
                            
                                R ReporteRs: Editing Existing Slides
                            
                                All N Combinations of All Subsets
                            
                                Create link to the other part of the Shiny app
                            
                                Reproduce the `expand.grid` function from R in Julia
                            
                                How to drop unused levels in table with data.table?
                            
                                R: Extract complete cases/included observations from linear model or formula variables
                            
                                How can I add labels to a choropleth map created using ggplot2?
                            
                                dplyr's mutate_each within function works but matches() does not find argument
                            
                                How to check if a file is compressed in R
                            
                                Count word frequencies in list-of-lists-of-words
                            
                                R k-means algorithm custom centers
                            
                                Auto populate week dates
                            
                                How to get labels in my ggplot heatmap?
                            
                                Caret error using GBM, but not without caret
                            
                                dummy variables to single categorical variable (factor) in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With