I am creating proportion tables based on an xts object. As this is part of a large program that (unfortunately) requires around 10^6 loops, it creates quite a bottleneck and I would like to speed it up.
This is an example of what I started with:
library(quantmod)
test.xts <- xts(sample(seq(1,5, by=.5), 50, replace=T), as.Date(1:50))
system.time(for(i in 1:10000){
prop.table(table(test.xts))
})
>user system elapsed
19.86 0.00 18.58
I have already changed the xts to a matrix and that resulted in a significant speed increase. I only mention that it is originally an xts in case I'm missing something with xts that would speed this up beyond the gains I've already seen converting it to a matrix.
test.mat <- as.matrix(test.xts)
system.time(for(i in 1:10000){
prop.table(table(test.mat))
})
>user system elapsed
2.78 0.00 2.90
But I'd really like it to be as fast as possible so I'm hoping that others out there have suggestions for further improvements. I'm hoping there's an obvious approach I'm overlooking.
One additional piece of information is that the output from these tables is ultimately merged with similar output from a different time period, so the dimensions need to remain named. (I.e., I need to be able to match the proportion for a value of '10' at time 1 with the proportion of '10' at time 2).
Any help is greatly appreciated.
table()
implicitly creates a factor which is expensive. In your case you can save a lot (more than 10x) by using tabulate()
since you already have integers:
a <- tabulate(test.mat)
names(a) <- seq_along(a)
a / sum(a)
1 2 3 4 5 6 7 8 9 10
0.16 0.14 0.08 0.14 0.08 0.16 0.02 0.06 0.10 0.06
timings:
system.time(for(i in 1:10000){
a <- tabulate(test.mat)
names(a) <- seq_along(a)
a/sum(a)
})
user system elapsed
0.208 0.002 0.210
your timing for comparison:
system.time(for(i in 1:10000) prop.table(table(test.mat)))
user system elapsed
3.373 0.028 3.402
TO build on joran's comment, using tabulate()
directly can prove faster. It does have three quirks worth noting:
See ?tabulate
for details.
With that caveat, here's a function that gives ~9x speed up:
prop2 <- function(x){
x <- tabulate(x)
out <- x/sum(x)
names(out) <- seq_along(out)
return(out)
}
Test speed:
library(rbenchmark)
test.mat <- as.matrix(test.xts)
f1 <- function() prop.table(table(test.mat))
benchmark(f1(), prop2(test.mat),
replications = 1000,
columns = c("test", "relative", "elapsed"),
order = "relative")
#------
test relative elapsed
2 prop2(test.mat) 1.0 0.10
1 f1() 9.1 0.91
Confirm output is same:
> prop.table(table(test.mat))
test.mat
1 2 3 4 5 6 7 8 9 10
0.04 0.02 0.20 0.12 0.08 0.10 0.06 0.14 0.12 0.12
> prop2(test.mat)
1 2 3 4 5 6 7 8 9 10
0.04 0.02 0.20 0.12 0.08 0.10 0.06 0.14 0.12 0.12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With