The objective is to create indicators for a factor/string variable in a data frame. That dataframe has > 2mm rows, and running R on windows, I don't have the option of using plyr with .parallel=T. So I'm taking the "divide and conquer" route with plyr and reshape2.
Running melt and cast runs out of memory, and using
ddply( idata.frame(items) , c("ID") , function(x){
( colSums( model.matrix( ~ x$element - 1) ) > 0 )
} , .progress="text" )
or
ddply( idata.frame(items) , c("ID") , function(x){
( elements %in% x$element )
} , .progress="text" )
does take a while. The fastest approach is the call to tapply below. Do you see a way to speed this up? The %in% statement runs faster than the model.matrix call. Thanks.
set.seed(123)
dd <- data.frame(
id = sample( 1:5, size=10 , replace=T ) ,
prd = letters[sample( 1:5, size=10 , replace=T )]
)
prds <- unique(dd$prd)
tapply( dd$prd , dd$id , function(x) prds %in% x )
For this problem, the packages bigmemory
and bigtabulate
might be your friends. Here is a slightly more ambitious example:
library(bigmemory)
library(bigtabulate)
set.seed(123)
dd <- data.frame(
id = sample( 1:15, size=2e6 , replace=T ),
prd = letters[sample( 1:15, size=2e6 , replace=T )]
)
prds <- unique(dd$prd)
benchmark(
bigtable(dd,c(1,2))>0,
table(dd[,1],dd[,2])>0,
xtabs(~id+prd,data=dd)>0,
tapply( dd$prd , dd$id , function(x) prds %in% x )
)
And the results of benchmarking (I'm learning new things all the time):
test replications elapsed relative user.self sys.self user.child sys.child
1 bigtable(dd, c(1, 2)) > 0 100 54.401 1.000000 51.759 3.817 0 0
2 table(dd[, 1], dd[, 2]) > 0 100 112.361 2.065422 107.526 6.614 0 0
4 tapply(dd$prd, dd$id, function(x) prds %in% x) 100 178.308 3.277660 166.544 13.275 0 0
3 xtabs(~id + prd, data = dd) > 0 100 229.435 4.217478 217.014 16.660 0 0
And that shows bigtable
winning out by a considerable amount. The results are pretty much that all prds are in all IDs, but see ?bigtable
for details on the format of the results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With