> class(v)
"numeric"
> length(v)
80373285 # 80 million
The entries of v
are integers uniformly distributed between 0 and 100.
> ptm <- proc.time()
> tv <- table(v)
> show(proc.time() - ptm)
user system elapsed
96.902 0.807 97.761
Why is the table
function so slow on this vector?
Is there a faster function for this simple operation?
By comparison, the bigtable
function from bigtabulate
is fast:
> library(bigtabulate)
> ptm <- proc.time() ; bt <- bigtable(x = matrix(v,ncol=1), ccols=1) ; show(proc.time() - ptm)
user system elapsed
4.163 0.120 4.286
While bigtabulate
is a good solution, it seems unwieldy to resort to a special package just for this simple function. Technically, there is overhead because I am contorting a vector into a matrix to make it work with bigtable
. Shouldn't there be simpler, faster solution in base R
?
For whatever its worth, the base R
function cumsum
is extremely fast even for this long vector:
> ptm <- proc.time() ; cs <- cumsum(v) ; show(proc.time() - ptm)
user system elapsed
0.097 0.117 0.214
Because it calls factor
first. Try tabulate
if all your entries are integers. But you need to plus 1, so that the vector values start from 1 not 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With