How would I go about using mutate
(my presumption is that I am looking for standard evaluation in my case, and hence mutate_
, but I am not entirely confident on this point) when using a function that accepts a list of variable names, such as this:
createSum = function(data, variableNames) {
data %>%
mutate_(sumvar = interp(~ sum(var, na.rm = TRUE),
var = as.name(paste(as.character(variableNames), collapse =","))))
}
Here is an MWE that strips the function to its core logic and demonstrates what I am trying to achieve:
library(dplyr)
library(lazyeval)
# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
liSample = lapply(colNames, function(week) {
sample = rnorm(sampleSize)
})
names(liSample) = as.character(colNames)
return(tbl_df(data.frame(liSample, check.names = FALSE)))
}
# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)
# test mutate on this table
dfTest %>%
mutate_(sumvar = interp(~ sum(var, na.rm = TRUE),
var = as.name(paste(as.character(weekDates), collapse =","))))
Expected output here is what would be returned by:
rowSums(dfTest[, as.character(weekDates)])
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.
mutate() adds new variables that are functions of existing variables.
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
I think this is what you're after
createSum = function(data, variableNames) {
data %>%
mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)
where we just supply a character value rather than interp
because you can't pass in a list of names as a single parameter to a function. Plus, sum()
would do some undesired collapsing because operations are not performed rowwise, they are passed in columns of vectors at a time.
The other problem with this example is that you set check.names=FALSE
in your data.frame which means that you've created column names that cannot be valid symbols. You can explicitly wrap your variable names in back-ticks if you like
createSum(dfTest , paste0("`", weekDates,"`"))
but in general it would be better not to use invalid names.
I don't know if this is an "officially sanctioned" dplyr
way, but this is a possibility:
weekDates = as.character(weekDates) # more convenient
dfTest %>% mutate(sumvar = Reduce(`+`, lapply(weekDates, get, .)))
#or
dfTest %>% mutate(sumvar = rowSums(as.data.frame(lapply(weekDates, get, .))))
This does carry potentially significant performance penalties, depending on your particular usage - in addition to dplyr
's regular copying of the entire data I think it also copies it a second time during that internal computation. You can look into data.table
to avoid the extra copying around by adding columns in place (and using .SDcols
to avoid the second copy) + you'll get arguably better syntax.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With