Note The described behaviour has been fixed in the dev version of dplyr. You can install dplyr using devtools::install_github("hadley/dplyr")
Please see this minimal example; I am using dplyr v0.3.0.2 and data.table v1.9.4
library(dplyr)
library(data.table)
f <- function(x, y, bad) {
z <- data.table(x,y, key = "x")
z2 <- z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
z2
}
f(rnorm(100), rnorm(100) < 0, bad = FALSE)
When I run the above I get
Error in `[.data.table`(dt, , list(sum.bad = sum(y == bad)), by = vars) :
object 'bad' not found
However bad is clearly defined and in scope.
If I just run this outside of a function it works
x <- rnorm(100)
y <- rnorm(100) <0
bad <- FALSE
z <- data.table(x,y, key = "x")
z2 <- z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
z2
What is the issue here? Is it a bug with either data.table or dplyr?
This error usually occurs for one of two reasons: Reason 1: You are attempting to reference an object you have not created. Reason 2: You are running a chunk of code where the object has not been defined in that chunk.
The error means that R could not find the variable mentioned in the error message. The easiest way to reproduce the error is to type the name of a variable that doesn't exist. (If you've defined x already, use a different variable name.)
If you try to refer to an object that has not been defined in an R code block or before it, you will raise the error object not found. The R interpreter could not find the variable mentioned in the error message. You can check if a variable exists using ls or exists, then create the variable if it does not exists.
The “object not found r” error message does not necessarily involve a function, because it can occur anytime you call an r object. It occurs when R can not find a variable in a data set. As a result, it an easy error message in R script to understand. # R error object not found > a Error: object 'a' not found
It occurs when R can not find a variable in a data set. As a result, it an easy error message in R script to understand. Here is an example of the simplest possible case of this message. In this case, an object labeled “a” is called without having been previously defined.
However, if test () is defined and exported in a package, it does not work any more when data is a data.table and fun is a dplyr verb function. Does your new package Depend or Import data.table? See : Sorry, something went wrong. Thanks @mattdowle!
However, if test () is defined and exported in a package, it does not work any more when data is a data.table and fun is a dplyr verb function. For example, # in some package #' @export test <- function ( data, fun) { function ( ...) { fun ( data, ... ) } }
Seems like this is a problem with how dplyr
is setting up the environment to the data.table call. The problem appears in the dplyr:::summarise_.grouped_dt
function. It currently looks like
function (.data, ..., .dots)
{
dots <- lazyeval::all_dots(.dots, ..., all_named = TRUE)
for (i in seq_along(dots)) {
if (identical(dots[[i]]$expr, quote(n()))) {
dots[[i]]$expr <- quote(.N)
}
}
list_call <- lazyeval::make_call(quote(list), dots)
call <- substitute(dt[, list_call, by = vars], list(list_call = list_call$expr))
env <- dt_env(.data, parent.frame())
out <- eval(call, env)
grouped_dt(out, drop_last(groups(.data)), copy = FALSE)
}
<environment: namespace:dplyr>
and if we debug that function and look at the trace when it's called, we see
where 1: summarise_.grouped_dt(.data, .dots = lazyeval::lazy_dots(...))
where 2: summarise_(.data, .dots = lazyeval::lazy_dots(...))
where 3: summarise(., sum.bad = sum(y == bad))
where 4: function_list[[k]](value)
where 5: withVisible(function_list[[k]](value))
where 6: freduce(value, `_function_list`)
where 7: `_fseq`(`_lhs`)
where 8: eval(expr, envir, enclos)
where 9: eval(quote(`_fseq`(`_lhs`)), env, env)
where 10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
where 11 at #3: z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
where 12: f(rnorm(100), rnorm(100) < 0, bad = FALSE)
So the important line is the
env <- dt_env(.data, parent.frame())
one. Here it's setting up the environment path which specifies where to look up all variables in the call. Here it's just using the parent.frame which is looks to where the function was called from, but since you actually jump through a few hoops to get to this function from your summarize
call inside f()
, this doesn't seem to be the right parent frame. If, instead you run
env <- dt_env(.data, parent.frame(2))
in debug mode, that seems to actually get at the correct parent frame. So i think the problem is the jump from summarize()
to summarize_()
because this
ff <- function(x, y, bad) {
z <- data.table(x,y, key = "x")
z2 <- z %>% group_by(x) %>% summarise_(.dots=list(sum.bad = quote(sum(y == bad))))
z2
}
ff(rnorm(100), rnorm(100) < 0, bad = FALSE)
seems to work. So it's really dplyr that needs to set up the correct environment. The tricky part is that appears to be different if you call summarize
or summarize_
directly. Perhaps summarise()
could change the environment when it calls summarise_
to have the same parent.frame via eval()
. But I'd probably file this as a bug report and let Hadley decide how to fix it. Something like
summarise <- function(.data, ...) {
call <- match.call()
call <- as.call(c(as.list(call)[1:2], list(.dots=as.list(call)[-(1:2)])))
call[[1]] <- quote(summarise_)
eval(call, envir=parent.frame())
}
would be a "traditional" way to do it. Not sure if the lazyeval package has nicer ways to do this or not.
Tested with data.table_1.9.2
and dplyr_0.3.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With