It comes down to this:
df = data.frame(a = 1:10)
#example function that takes optional arguments
mymean <- function(x, w = 1, s = 0) { s + mean(w * x) }
summarize_(df, m = "mean(a)")
#> m
#> 1 5.5
summarize_(df, m = "mymean(a)")
#> Error: could not find function "mymean"
According to `vignette("nse") summarize_ must be given the formula syntax when using non-standard summarizing functions.
Ultimately, I want to be able to wrap summarize_
in a function like so:
my_summary <- function(df, x, ...) {
summarize_(df,
m = "mean(a)",
wm = "mymean(a, ...)" #gives error
}
#Partial working function
my_summary <- function(df, x, ...) {
summarize_(df,
m = "mean(a)", #works
wm1 = interp(mymean(a), a = as.name(x) #works but doesn't allow ...
wm2 = interp(mymean(a, b),
.values=list(a = as.name(x),
b = quote(...)), #doesn't work
wm3 = interp(mymean(a, ...), a = as.name(x) #doesn't work
}
A working function would allow me to call:
my_summary(df, a)
#> 5.5
my_summary(df, a, w=5, s=2)
#> 29.5
Luckily, the dplyr package provides a number of very useful functions for manipulating data frames in a way that will reduce the above repetition, reduce the probability of making errors, and probably even save you some typing. As an added bonus, you might even find the dplyr grammar easier to read.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".
Similarly to readr , dplyr and tidyr are also part of the tidyverse. These packages were loaded in R's memory when we called library(tidyverse) earlier.
Since the problem is passing ...
to the function, one solution is to construct the call via call
and do.call
(yes, both):
library(dplyr)
df = data.frame(a = 1:10)
mymean = function(x, w = 1, s = 0)
s + mean(w * x)
my_summary = function (df, x, ...) {
x = as.name(substitute(x))
mycall = do.call(call, c('mymean', quote(x), list(...)))
summarize_(df,
m = lazyeval::interp(~mean(x), x = x),
w = lazyeval::lazy_(mycall, environment()))
}
my_summary(df, a)
#> m w
#> 1 5.5 5.5
my_summary(df, a, w = 5, s = 2)
#> m w
#> 1 5.5 29.5
Incidentally, the above also fixes passing the column name — I couldn’t get your code to work, and I don’t think it would work that way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With