TL;DR: how can I pass fun.aggregate
into dcast.data.table
when the call to dcast.data.table
is done within a function (to which I pass fun.aggregate
)?
I have a table like this:
library(data.table)
t <- data.table(id=rep(1:2, c(3,4)), k=c(rep(letters[1:3], 2), 'c'), v=1:7)
t
id k v
1: 1 a 1
2: 1 b 2
3: 1 c 3
4: 2 a 4
5: 2 b 5
6: 2 c 6
7: 2 c 7 # note the duplicate (2, c)
I reshape to long format, retaining the last occurence of duplicates
dcast.data.table(t, id ~ k, value.var='v', fun.aggregate=last) # last is in data.table
id a b c
1: 1 1 2 3
2: 2 4 5 7
However if I wrap my dcast.data.table
call into a function:
f <- function (tbl, fun.aggregate) {
dcast.data.table(tbl, id ~ k, value.var='v', fun.aggregate=fun.aggregate)
}
f(t, last)
Error in `[.data.table`(data, , eval(fun.aggregate), by = c(ff_)) :
could not find function "fun.aggregate"
It looks like the symbol fun.aggregate
is being evaluated (eval(fun.aggregate)
) and not found (since the function "fun.aggregate" does not exist).
How should I pass my desired fun.aggregate
in to f
?
(I'm sure it has something to do with quote
, substitute
etc but I struggle greatly with those functions and I typically just chain them together at random until something works).
Edit:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
...
other attached packages:
[1] data.table_1.9.3
Oops, I just realised that this bug is in 1.9.3 (dev version, which I upgraded to to avoid an unrelated bug) and not in 1.9.2 (current CRAN release version).
I would rather not downgrade to 1.9.2 (aforementioned bug I'm avoiding), so in general is there a way to protect an argument to a function from the eval()
call?
This is now fixed in commit 1303 from v 1.9.3 - the current development version. From NEWS:
dcast.data.table
handlesfun.aggregate
argument properly when called from within a function that acceptsfun.aggregate
argument and passes todcast.data.table()
. Closes #713. Thanks to mathematicalcoffee for reporting here on SO.
Note that there was another small oversight in dcast.data.table
that's been fixed now - #715.
The issue is that last
function does not produce a length-1 value for all input values - which is a requirement for fun.aggregate
.
last(integer(0))
# [1] integer(0)
When fill
argument is not set, this is the value that's used to fill missing combinations. This case was not caught before, but is now fixed.
Here's an example of the (correct) behaviour now:
tt <- t[1:5] # t is from your example
dcast.data.table(tt, id ~ k, fun.aggregate=last)
# Error in dcast.data.table(tt, id ~ k, fun.aggregate = last) :
# Aggregating function provided to argument 'fun.aggregate' should always return
# a length 1 vector, but returns 0-length value for fun.aggregate(integer(0)).
# This value will have to be used to fill missing combinations, if any, and
# therefore can not be of length 0. Either override by setting the 'fill' argument
# explicitly or modify your function to handle this case appropriately.
dcast.data.table(tt, id ~ k, fun.aggregate=last, fill=NA)
# id a b c
# 1: 1 1 2 3
# 2: 2 4 5 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With