> class(v)
"numeric"
> length(v)
80373285 # 80 million
The entries of v are integers uniformly distributed between 0 and 100.
> ptm <- proc.time()
> tv <- table(v)
> show(proc.time() - ptm)
user system elapsed
96.902 0.807 97.761
Why is the table function so slow on this vector?
Is there a faster function for this simple operation?
By comparison, the bigtable function from bigtabulate is fast:
> library(bigtabulate)
> ptm <- proc.time() ; bt <- bigtable(x = matrix(v,ncol=1), ccols=1) ; show(proc.time() - ptm)
user system elapsed
4.163 0.120 4.286
While bigtabulate is a good solution, it seems unwieldy to resort to a special package just for this simple function. Technically, there is overhead because I am contorting a vector into a matrix to make it work with bigtable. Shouldn't there be simpler, faster solution in base R?
For whatever its worth, the base R function cumsum is extremely fast even for this long vector:
> ptm <- proc.time() ; cs <- cumsum(v) ; show(proc.time() - ptm)
user system elapsed
0.097 0.117 0.214
Because it calls factor first. Try tabulate if all your entries are integers. But you need to plus 1, so that the vector values start from 1 not 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With