Rust optimizing out loops?

Question

I was doing some very simple benchmarks to compare performance of C and Rust. I used a function adding integers 1 + 2 + ... + n (something that I could verify by a computation by hand), where n = 10^10.

The code in Rust looks like this:

fn main() {
  let limit: u64 = 10000000000;
  let mut buf: u64 = 0;
  for u64::range(1, limit) |i| {
    buf = buf + i;
  }
  io::println(buf.to_str());
}

The C code is as follows:

#include <stdio.h>
int main()
{
  unsigned long long buf = 0;
  for(unsigned long long i = 0; i < 10000000000; ++i) {
    buf = buf + i;
  }
  printf("%llu
", buf);
  return 0;
}

I compiled and run them:

$ rustc sum.rs -o sum_rust
$ time ./sum_rust
13106511847580896768

real        6m43.122s
user        6m42.597s
sys 0m0.076s
$ gcc -Wall -std=c99 sum.c -o sum_c
$ time ./sum_c
13106511847580896768

real        1m3.296s
user        1m3.172s
sys         0m0.024s

Then I tried with optimizations flags on, again both C and Rust:

$ rustc sum.rs -o sum_rust -O
$ time ./sum_rust
13106511847580896768

real        0m0.018s
user        0m0.004s
sys         0m0.012s
$ gcc -Wall -std=c99 sum.c -o sum_c -O9
$ time ./sum_c
13106511847580896768

real        0m16.779s
user        0m16.725s
sys         0m0.008s

These results surprised me. I did expected the optimizations to have some effect, but the optimized Rust version is 100000 times faster :).

I tried changing n (the only limitation was u64, the run time was still virtually zero), and even tried a different problem (1^5 + 2^5 + 3^5 + ... + n^5), with similar results: executables compiled with rustc -O are several orders of magnitude faster than without the flag, and are also many times faster than the same algorithm compiled with gcc -O9.

So my question is: what's going on? :) I could understand a compiler optimizing 1 + 2 + .. + n = (n*n + n)/2, but I can't imagine that any compiler could derive a formula for 1^5 + 2^5 + 3^5 + .. + n^5. On the other hand, as far as I can see, the result must've been computed somehow (and it seems to be correct).

Oh, and:

$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
$ rustc --version
rustc 0.6 (dba9337 2013-05-10 05:52:48 -0700)
host: i686-unknown-linux-gnu

huon · Accepted Answer

Yes, compilers do use the 1 + ... + n = n*(n+1)/2 optimisation to remove the loop, and there are similar tricks for any power of the summation variable. e.g. k¹ are triangular numbers, k² are pyramidal numbers, k³ are squared triangular numbers, etc. In general, there is even a formula to calculate ∑_kk^p for any p.

You can use a more complicated expression, so that the compiler doesn't have any tricks to remove the loop. e.g.

fn main() {
  let limit: u64 = 1000000000;
  let mut buf: u64 = 0;
  for u64::range(1, limit) |i| {
    buf += i + i ^ (i*i);
  }
  io::println(buf.to_str());
}

and

#include <stdio.h>
int main()
{
  unsigned long long buf = 0;
  for(unsigned long long i = 0; i < 1000000000; ++i) {
    buf += i + i ^ (i * i);
  }
  printf("%llu
", buf);
  return 0;
}

which gives me

real    0m0.700s
user    0m0.692s
sys     0m0.004s

and

real    0m0.698s
user    0m0.692s
sys     0m0.000s

respectively (with -O for both compilers).

Rust optimizing out loops?

Tags:

optimization

gcc

rust

Jan Špaček

1 Answers

huon

Recent Activity

Donate For Us

Rust optimizing out loops?

Tags:

optimization

gcc

rust

Jan Špaček

1 Answers

huon

Related questions

Recent Activity

Donate For Us