Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

deparse(substitute()) within function using data.table as argument

If I want do deparse the argument of a function for an error or a warning, something strange is happening if the argument is converted to a data.table within the function:

e <- data.frame(x = 1:10)
### something strange is happening
foo <- function(u) {
  u <- data.table(u)
  warning(deparse(substitute(u)), " is not a data.table")
  u
}
foo(e)

##  foo(e)
##      x
##  1:  1
##  2:  2
##  3:  3
##  4:  4
##  5:  5
##  6:  6
##  7:  7
##  8:  8
##  9:  9
## 10: 10
## Warning message:
## In foo(e) :
##   structure(list(x = 1:10), .Names = "x", row.names = c(NA, -10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x10026568>) is not a data.table

If I deparse it before data.table everything works fine:

### ok
foo1 <- function(u) {
  nu <- deparse(substitute(u))
  u <- data.table(u)
  warning(nu, " is not a data.table")
  u
}
## foo1(e)
##      x
##  1:  1
##  2:  2
##  3:  3
##  4:  4
##  5:  5
##  6:  6
##  7:  7
##  8:  8
##  9:  9
## 10: 10
## Warning message:
## In foo1(e) : e is not a data.table

There is by the way no difference if e already is a data.table or not. I found it on purpose, when I was profiling some code, where deparse was very time consuming because e was quite big.

What's happening here and how can I handle such functions for data.frame and data.table input?

nachti

like image 395
nachti Avatar asked May 09 '14 11:05

nachti


1 Answers

This is because substitute behaves differently when you are dealing with a normal variable instead of a promise object. A promise object is a formal argument and has a special slot that contains the expression that generated it. In other words, a promise object is a variable in a function that is part of the argument list of that function. When you use substitute on a promise object in a function, then it will retrieve the expression in the call to the function that was assigned to that formal argument. From ?substitute:

Substitution takes place by examining each component of the parse tree as follows: If it is not a bound symbol in env, it is unchanged. If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol. If it is an ordinary variable, its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged.

In your case, you actually overwrite the original promise variable with a new one with:

u <- data.table(u)

at which point u becomes a normal variable that contains a data table. When you substitute on u after this point, substitute just returns the data table, which deparse processes back to the R language that would generate it, which is why it is slow.

This also explains why your second example works. You substitute while the variable is still a promise (i.e. before you overwrite u). This is also the answer to your second question. Either substitute before you overwrite your promise, or don't overwrite your promise.

For more details, see section 2.1.8 of the R Language Definition (promises) which I excerpt here:

Promise objects are part of R’s lazy evaluation mechanism. They contain three slots: a value, an expression, and an environment. When a function is called the arguments are matched and then each of the formal arguments is bound to a promise. The expression that was given for that formal argument and a pointer to the environment the function was called from are stored in the promise.

like image 55
BrodieG Avatar answered Oct 22 '22 22:10

BrodieG