I've been running through Project Euler trying to write programs that are computationally efficient. Consider problem 1: http://projecteuler.net/problem=1. I've upped the range from 1000 to 10,000,000 to highlight inefficiencies.
This is my solution:
system.time({
x <- 1:1E7
a <- sum(as.numeric(x[x%%3 ==0 | x%%5==0]))
})
user system elapsed
0.980 0.041 1.011
Here is some C++ code written by a friend to do the same thing.
#include <iostream>
using namespace std;
int main(int argc, char** argv)
{
long x = 0;
for (int i = 1; i < 10000000; i++)
{
if (i % 3 == 0)
x += i;
else if (i % 5 == 0)
x += i;
}
cout << x;
return 0;
}
cbaden$ time ./a.out
23333331666668
real 0m0.044s
user 0m0.042s
sys 0m0.001s
I know C++ should be faster than R, but this much faster? Rprof indicate that I'm spending almost 60% of my time with the modulo operator and 13% of the time with the "==" operation. Are there any vectorized ways of doing this faster?
A secondary concern would be that I'm going to run out of memory--this approach is not very scalable as the range gets larger. Is there a good way to do this that preserves the vectorizability, yet doesn't try to keep the subset in memory?
Modulo is faster when it operates on integer
s and not numeric
s:
f1 <- function() {
x <- 1:1E7
a <- sum(as.numeric(x[x%%3 ==0 | x%%5==0]))
}
f2 <- function() {
x <- 1:1E7
a <- sum(as.numeric(x[x %% 3L == 0L | x %% 5L == 0L]))
}
library(rbenchmark)
benchmark(f1(), f2(), replications = 5)
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 f1() 5 14.78 4.976431 13.95 0.67 NA NA
# 2 f2() 5 2.97 1.000000 2.37 0.50 NA NA
That's still far from C++ performance, but it's a step in the right direction.
A faster solution
x <-1E7
a<-x%/%3
b<-x%/%5
c<-x%/%15
ans<-3*a*(a+1)/2+5*b*(b+1)/2-15*c*(c+1)/2
doesnt really help with regards to the modulo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With