Faster proportion tables in R

Question

I am creating proportion tables based on an xts object. As this is part of a large program that (unfortunately) requires around 10^6 loops, it creates quite a bottleneck and I would like to speed it up.

This is an example of what I started with:

library(quantmod)

test.xts <- xts(sample(seq(1,5, by=.5), 50, replace=T), as.Date(1:50))

system.time(for(i in 1:10000){

  prop.table(table(test.xts))

})

>user  system elapsed 
 19.86    0.00   18.58

I have already changed the xts to a matrix and that resulted in a significant speed increase. I only mention that it is originally an xts in case I'm missing something with xts that would speed this up beyond the gains I've already seen converting it to a matrix.

test.mat <- as.matrix(test.xts)

system.time(for(i in 1:10000){

  prop.table(table(test.mat))

})

>user  system elapsed 
 2.78    0.00    2.90

But I'd really like it to be as fast as possible so I'm hoping that others out there have suggestions for further improvements. I'm hoping there's an obvious approach I'm overlooking.

One additional piece of information is that the output from these tables is ultimately merged with similar output from a different time period, so the dimensions need to remain named. (I.e., I need to be able to match the proportion for a value of '10' at time 1 with the proportion of '10' at time 2).

Any help is greatly appreciated.

Simon Urbanek · Accepted Answer

table() implicitly creates a factor which is expensive. In your case you can save a lot (more than 10x) by using tabulate() since you already have integers:

a <- tabulate(test.mat)
names(a) <- seq_along(a)
a / sum(a)
   1    2    3    4    5    6    7    8    9   10 
0.16 0.14 0.08 0.14 0.08 0.16 0.02 0.06 0.10 0.06

timings:

system.time(for(i in 1:10000){
  a <- tabulate(test.mat)
  names(a) <- seq_along(a)
  a/sum(a)
})

 user  system elapsed 
0.208   0.002   0.210

your timing for comparison:

system.time(for(i in 1:10000) prop.table(table(test.mat)))
 user  system elapsed 
3.373   0.028   3.402

Chase · Answer

TO build on joran's comment, using tabulate() directly can prove faster. It does have three quirks worth noting:

It only deals with integers and truncates decimals.
It silently ignores all negative values and zeros.
It creates a bin for all values 1:n, even if there are zero counts

See ?tabulate for details.

With that caveat, here's a function that gives ~9x speed up:

prop2 <- function(x){
  x <- tabulate(x)
  out <- x/sum(x)
  names(out) <- seq_along(out)
  return(out)
}

Test speed:

library(rbenchmark)
test.mat <- as.matrix(test.xts)
f1 <- function() prop.table(table(test.mat))
benchmark(f1(), prop2(test.mat),
           replications = 1000,
           columns = c("test", "relative", "elapsed"),
           order = "relative")
#------
             test relative elapsed
2 prop2(test.mat)      1.0    0.10
1            f1()      9.1    0.91

Confirm output is same:

> prop.table(table(test.mat))
test.mat
   1    2    3    4    5    6    7    8    9   10 
0.04 0.02 0.20 0.12 0.08 0.10 0.06 0.14 0.12 0.12 
> prop2(test.mat)
   1    2    3    4    5    6    7    8    9   10 
0.04 0.02 0.20 0.12 0.08 0.10 0.06 0.14 0.12 0.12

Faster proportion tables in R

Tags:

performance

r

xts

Bryan S

2 Answers

Simon Urbanek

Chase

Recent Activity

Donate For Us

Faster proportion tables in R

Tags:

performance

r

xts

Bryan S

2 Answers

Simon Urbanek

Chase

Related questions

Recent Activity

Donate For Us