Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: table function suprisingly slow

Tags:

r

> class(v)
"numeric"
> length(v)
80373285   # 80 million

The entries of v are integers uniformly distributed between 0 and 100.

> ptm  <-  proc.time()
> tv   <-  table(v)
> show(proc.time() - ptm)
   user  system elapsed 
 96.902   0.807  97.761 

Why is the table function so slow on this vector?

Is there a faster function for this simple operation?

By comparison, the bigtable function from bigtabulate is fast:

> library(bigtabulate)
> ptm  <-  proc.time() ;  bt <- bigtable(x = matrix(v,ncol=1), ccols=1) ; show(proc.time() - ptm)
   user  system elapsed 
  4.163   0.120   4.286 

While bigtabulate is a good solution, it seems unwieldy to resort to a special package just for this simple function. Technically, there is overhead because I am contorting a vector into a matrix to make it work with bigtable. Shouldn't there be simpler, faster solution in base R?

For whatever its worth, the base R function cumsum is extremely fast even for this long vector:

> ptm  <-  proc.time() ; cs   <-  cumsum(v) ; show(proc.time() - ptm)
   user  system elapsed 
  0.097   0.117   0.214 
like image 697
cmo Avatar asked Jul 24 '18 16:07

cmo


1 Answers

Because it calls factor first. Try tabulate if all your entries are integers. But you need to plus 1, so that the vector values start from 1 not 0.

like image 113
Zheyuan Li Avatar answered Oct 02 '22 07:10

Zheyuan Li