I was doing some very simple benchmarks to compare performance of C and Rust. I used a function adding integers 1 + 2 + ... + n
(something that I could verify by a computation by hand), where n = 10^10
.
The code in Rust looks like this:
fn main() {
let limit: u64 = 10000000000;
let mut buf: u64 = 0;
for u64::range(1, limit) |i| {
buf = buf + i;
}
io::println(buf.to_str());
}
The C code is as follows:
#include <stdio.h>
int main()
{
unsigned long long buf = 0;
for(unsigned long long i = 0; i < 10000000000; ++i) {
buf = buf + i;
}
printf("%llu\n", buf);
return 0;
}
I compiled and run them:
$ rustc sum.rs -o sum_rust
$ time ./sum_rust
13106511847580896768
real 6m43.122s
user 6m42.597s
sys 0m0.076s
$ gcc -Wall -std=c99 sum.c -o sum_c
$ time ./sum_c
13106511847580896768
real 1m3.296s
user 1m3.172s
sys 0m0.024s
Then I tried with optimizations flags on, again both C and Rust:
$ rustc sum.rs -o sum_rust -O
$ time ./sum_rust
13106511847580896768
real 0m0.018s
user 0m0.004s
sys 0m0.012s
$ gcc -Wall -std=c99 sum.c -o sum_c -O9
$ time ./sum_c
13106511847580896768
real 0m16.779s
user 0m16.725s
sys 0m0.008s
These results surprised me. I did expected the optimizations to have some effect, but the optimized Rust version is 100000 times faster :).
I tried changing n
(the only limitation was u64
, the run time was still virtually zero), and even tried a different problem (1^5 + 2^5 + 3^5 + ... + n^5
), with similar results: executables compiled with rustc -O
are several orders of magnitude faster than without the flag, and are also many times faster than the same algorithm compiled with gcc -O9
.
So my question is: what's going on? :) I could understand a compiler optimizing 1 + 2 + .. + n = (n*n + n)/2
, but I can't imagine that any compiler could derive a formula for 1^5 + 2^5 + 3^5 + .. + n^5
. On the other hand, as far as I can see, the result must've been computed somehow (and it seems to be correct).
Oh, and:
$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
$ rustc --version
rustc 0.6 (dba9337 2013-05-10 05:52:48 -0700)
host: i686-unknown-linux-gnu
Yes, compilers do use the 1 + ... + n = n*(n+1)/2
optimisation to remove the loop, and there are similar tricks for any power of the summation variable. e.g. k1 are triangular numbers, k2 are pyramidal numbers, k3 are squared triangular numbers, etc. In general, there is even a formula to calculate ∑kkp for any p.
You can use a more complicated expression, so that the compiler doesn't have any tricks to remove the loop. e.g.
fn main() {
let limit: u64 = 1000000000;
let mut buf: u64 = 0;
for u64::range(1, limit) |i| {
buf += i + i ^ (i*i);
}
io::println(buf.to_str());
}
and
#include <stdio.h>
int main()
{
unsigned long long buf = 0;
for(unsigned long long i = 0; i < 1000000000; ++i) {
buf += i + i ^ (i * i);
}
printf("%llu\n", buf);
return 0;
}
which gives me
real 0m0.700s
user 0m0.692s
sys 0m0.004s
and
real 0m0.698s
user 0m0.692s
sys 0m0.000s
respectively (with -O
for both compilers).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With