Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R foreach with .combine=rbindlist

Tags:

r

data.table

I am using foreach with a .combine = rbindlist. This does not appear to work, although it works fine if I use .combine = rbind.

Just to illustrate using a simple example --

> t2 <- data.table(col1=c(1,2,3))
> foreach (i=1:3, .combine=rbind) %dopar% unique(t2)
   col1
1:    1
2:    2
3:    3
4:    1
5:    2
6:    3
7:    1
8:    2
9:    3

# But using rbindlist gives an error

> foreach (i=1:3, .combine=rbindlist) %dopar% unique(t2)
error calling combine function:
<simpleError in fun(result.1, result.2): unused argument(s) (result.2)>
NULL

Has anyone been able to make this work ?

Thanks in advance.

like image 258
xbsd Avatar asked Jul 01 '13 18:07

xbsd


2 Answers

It's basically what you said - rbindlist assumes a list argument, and the error you're getting is the same as this one:

result.1 = data.table(blah = 23)
result.2 = data.table(blah = 34)

rbindlist(result.1, result.2)
#Error in rbindlist(result.1, result.2) : unused argument (result.2)

If you want to utilize rbindlist, the way to do it would be this:

rbindlist(foreach (i = 1:3) %dopar% unique(t2))

or this:

foreach (i=1:3, .combine=function(x,y)rbindlist(list(x,y))) %dopar% unique(t2)
like image 81
eddi Avatar answered Oct 31 '22 20:10

eddi


Here's a way to both use rbindlist as your .combine function and have .multicombine=TRUE:

foreach (i=1:3,
         .combine=function(...) rbindlist(list(...)),
         .multicombine=TRUE) %dopar% unique(t2)

If you have a decent amount of seperate results to aggregate, this could be quite a bit faster than only combining two-at-a-time.

For a single foreach statement, this produces the same result as letting foreach default .combine to list and wrapping with rbindlist, as in eddi's first solution. I'm not sure which is faster, though I would expect them to be close.

For small, single-foreach jobs I like wrapping with rbindlist, but when chaining several foreach's together with %:% I think the above approach (likely in the first foreach) looks cleaner.

like image 14
ClaytonJY Avatar answered Oct 31 '22 21:10

ClaytonJY