Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass fun.aggregate as argument to dcast.data.table?

Tags:

r

data.table

TL;DR: how can I pass fun.aggregate into dcast.data.table when the call to dcast.data.table is done within a function (to which I pass fun.aggregate)?

I have a table like this:

library(data.table)
t <- data.table(id=rep(1:2, c(3,4)), k=c(rep(letters[1:3], 2), 'c'), v=1:7)
t
   id k v
1:  1 a 1
2:  1 b 2
3:  1 c 3
4:  2 a 4
5:  2 b 5
6:  2 c 6
7:  2 c 7  # note the duplicate (2, c)

I reshape to long format, retaining the last occurence of duplicates

dcast.data.table(t, id ~ k, value.var='v', fun.aggregate=last) # last is in data.table
   id a b c
1:  1 1 2 3
2:  2 4 5 7

However if I wrap my dcast.data.table call into a function:

f <- function (tbl, fun.aggregate) {
    dcast.data.table(tbl, id ~ k, value.var='v', fun.aggregate=fun.aggregate)
}
f(t, last)
Error in `[.data.table`(data, , eval(fun.aggregate), by = c(ff_)) : 
  could not find function "fun.aggregate"

It looks like the symbol fun.aggregate is being evaluated (eval(fun.aggregate)) and not found (since the function "fun.aggregate" does not exist).

How should I pass my desired fun.aggregate in to f?

(I'm sure it has something to do with quote, substitute etc but I struggle greatly with those functions and I typically just chain them together at random until something works).


Edit:

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

...

other attached packages:
[1] data.table_1.9.3

Oops, I just realised that this bug is in 1.9.3 (dev version, which I upgraded to to avoid an unrelated bug) and not in 1.9.2 (current CRAN release version).

I would rather not downgrade to 1.9.2 (aforementioned bug I'm avoiding), so in general is there a way to protect an argument to a function from the eval() call?

like image 306
mathematical.coffee Avatar asked Jul 03 '14 00:07

mathematical.coffee


1 Answers

This is now fixed in commit 1303 from v 1.9.3 - the current development version. From NEWS:

dcast.data.table handles fun.aggregate argument properly when called from within a function that accepts fun.aggregate argument and passes to dcast.data.table(). Closes #713. Thanks to mathematicalcoffee for reporting here on SO.


Note that there was another small oversight in dcast.data.table that's been fixed now - #715.

The issue is that last function does not produce a length-1 value for all input values - which is a requirement for fun.aggregate.

last(integer(0))
# [1] integer(0)

When fill argument is not set, this is the value that's used to fill missing combinations. This case was not caught before, but is now fixed.

Here's an example of the (correct) behaviour now:

tt <- t[1:5] # t is from your example
dcast.data.table(tt, id ~ k, fun.aggregate=last)
# Error in dcast.data.table(tt, id ~ k, fun.aggregate = last) : 
#   Aggregating function provided to argument 'fun.aggregate' should always return 
#   a length 1 vector, but returns 0-length value for fun.aggregate(integer(0)). 
#   This value will have to be used to fill missing combinations, if any, and 
#   therefore can not be of length 0. Either override by setting the 'fill' argument 
#   explicitly or modify your function to handle this case appropriately.

dcast.data.table(tt, id ~ k, fun.aggregate=last, fill=NA)
#    id a b  c
# 1:  1 1 2  3
# 2:  2 4 5 NA
like image 174
Arun Avatar answered Oct 13 '22 10:10

Arun