I wanted to learn a bit about rust tasks, so I did a monte carlo computation of PI. Now my puzzle is why the single-threaded C version is 4 times faster than the 4-way threaded Rust version. Clearly I am doing something wrong, or my mental performance model is way off. Here's the C version: <pre class="prettyprint"><code>#include <stdlib.h> #include <sys/types.h> #include <unistd.h> #include <stdio.h> #define PI 3.1415926535897932 double monte_carlo_pi(int nparts) { int i, in=0; double x, y; srand(getpid()); for (i=0; i<nparts; i++) { x = (double)rand()/(double)RAND_MAX; y = (double)rand()/(double)RAND_MAX; if (x*x + y*y < 1.0) { in++; } } return in/(double)nparts * 4.0; } int main(int argc, char **argv) { int nparts; double mc_pi; nparts = atoi(argv[1]); mc_pi = monte_carlo_pi(nparts); printf("computed: %f error: %f\n", mc_pi, mc_pi - PI); } </code></pre> The Rust version was not a line-by-line port: <pre class="prettyprint"><code>use std::rand; use std::rand::distributions::{IndependentSample,Range}; fn monte_carlo_pi(nparts: uint ) -> uint { let between = Range::new(0f64,1f64); let mut rng = rand::task_rng(); let mut in_circle = 0u; for _ in range(0u, nparts) { let a = between.ind_sample(&mut rng); let b = between.ind_sample(&mut rng); if a*a + b*b <= 1.0 { in_circle += 1; } } in_circle } fn main() { let (tx, rx) = channel(); let ntasks = 4u; let nparts = 100000000u; /* I haven't learned how to parse cmnd line args yet!*/ for _ in range(0u, ntasks) { let child_tx = tx.clone(); spawn(proc() { child_tx.send(monte_carlo_pi(nparts/ntasks)); }); } let result = rx.recv() + rx.recv() + rx.recv() + rx.recv(); println!("pi is {}", (result as f64)/(nparts as f64)*4.0); } </code></pre> Build and time the C version: <pre class="prettyprint"><code>$ clang -O2 mc-pi.c -o mc-pi-c; time ./mc-pi-c 100000000 computed: 3.141700 error: 0.000108 ./mc-pi-c 100000000 1.68s user 0.00s system 99% cpu 1.683 total </code></pre> Build and time the Rust version: <pre class="prettyprint"><code>$ rustc -v rustc 0.12.0-nightly (740905042 2014-09-29 23:52:21 +0000) $ rustc --opt-level 2 --debuginfo 0 mc-pi.rs -o mc-pi-rust; time ./mc-pi-rust pi is 3.141327 ./mc-pi-rust 2.40s user 24.56s system 352% cpu 7.654 tota </code></pre>

The bottleneck, as Dogbert observed, was the random number generator. Here's one that is fast and seeded differently on each thread <pre class="prettyprint"><code>fn monte_carlo_pi(id: u32, nparts: uint ) -> uint { ... let mut rng: XorShiftRng = SeedableRng::from_seed([id,id,id,id]); ... } </code></pre>

Meaningful benchmarks are a tricky thing, because you have all kinds of optimization options, etc. Also, the structure of the code can have a huge impact. Comparing C and Rust is a little like comparing apples and oranges. We typically use compute-intensive algorithms like the one you dispicit above, but the real world can throw you a curve. Having said that, in general, Rust can and does approach the peformance of C and C++, and most likey can do better on concurrency tasks in general. Take a look at the benchmarks here: https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/rust-clang.html I chose the Rust vs. C Clang benchmark comparasion, because both rely on the underlying LLVM. <img src="https://i.stack.imgur.com/ln8nu.png" alt="enter image description here"> On the other hand, a comparasion with C gcc yields different results: <img src="https://i.stack.imgur.com/7W3Ng.png" alt="enter image description here"> And guess what? Rust still comes out ahead! I entreat you to explore the Benchmark Game site in more detail. There are some cases where C will edge out Rust in some instances. In general, when you are creating a real-world solution, you want to do performance benchmarks for your specific cases. Always do this, because you will often be surprised by the results. Never assume. I think that too many times, benchmarks are used to forward the "my language is better than your langage" style of rwars. But as one who have used over 20 computer languages throughout his longish career, I always say that it is a matter of the best tool for the job.

rust vs c performance

Tags:

performance

c

rust

I wanted to learn a bit about rust tasks, so I did a monte carlo computation of PI. Now my puzzle is why the single-threaded C version is 4 times faster than the 4-way threaded Rust version. Clearly I am doing something wrong, or my mental performance model is way off.

Here's the C version:

#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>

#define PI 3.1415926535897932

double monte_carlo_pi(int nparts)
{
    int i, in=0;
    double x, y;
    srand(getpid());

    for (i=0; i<nparts; i++) {
        x = (double)rand()/(double)RAND_MAX;
        y = (double)rand()/(double)RAND_MAX;

            if (x*x + y*y < 1.0) {
            in++;
        }
    }

    return in/(double)nparts * 4.0;
}

int main(int argc, char **argv)
{
    int nparts;
    double mc_pi;

    nparts = atoi(argv[1]);
    mc_pi = monte_carlo_pi(nparts);
    printf("computed: %f error: %f\n", mc_pi, mc_pi - PI);
}

The Rust version was not a line-by-line port:

use std::rand;
use std::rand::distributions::{IndependentSample,Range};

fn monte_carlo_pi(nparts: uint ) -> uint {
    let between = Range::new(0f64,1f64);
    let mut rng = rand::task_rng();
    let mut in_circle = 0u;
    for _ in range(0u, nparts) {
        let a = between.ind_sample(&mut rng);
    let b = between.ind_sample(&mut rng);

    if a*a + b*b <= 1.0 {
        in_circle += 1;
    }
    }
    in_circle
}

fn main() {
    let (tx, rx) = channel();

    let ntasks = 4u;
    let nparts = 100000000u; /* I haven't learned how to parse cmnd line args yet!*/
    for _ in range(0u, ntasks) {
        let child_tx = tx.clone();
        spawn(proc() {
        child_tx.send(monte_carlo_pi(nparts/ntasks));
        });
    }

    let result = rx.recv() + rx.recv() + rx.recv() + rx.recv();

    println!("pi is {}", (result as f64)/(nparts as f64)*4.0);
}

Build and time the C version:

$ clang -O2 mc-pi.c -o mc-pi-c; time ./mc-pi-c 100000000
computed: 3.141700 error: 0.000108
./mc-pi-c 100000000  1.68s user 0.00s system 99% cpu 1.683 total

Build and time the Rust version:

$ rustc -v      
rustc 0.12.0-nightly (740905042 2014-09-29 23:52:21 +0000)
$ rustc --opt-level 2 --debuginfo 0 mc-pi.rs -o mc-pi-rust; time ./mc-pi-rust  
pi is 3.141327
./mc-pi-rust  2.40s user 24.56s system 352% cpu 7.654 tota

392

asked Oct 09 '14 14:10

Rob Latham

2 Answers

The bottleneck, as Dogbert observed, was the random number generator. Here's one that is fast and seeded differently on each thread

fn monte_carlo_pi(id: u32, nparts: uint ) -> uint {
    ...
    let mut rng: XorShiftRng = SeedableRng::from_seed([id,id,id,id]);
    ...
}

answered Oct 18 '22 18:10

Rob Latham

Meaningful benchmarks are a tricky thing, because you have all kinds of optimization options, etc. Also, the structure of the code can have a huge impact.

Comparing C and Rust is a little like comparing apples and oranges. We typically use compute-intensive algorithms like the one you dispicit above, but the real world can throw you a curve.

Having said that, in general, Rust can and does approach the peformance of C and C++, and most likey can do better on concurrency tasks in general.

Take a look at the benchmarks here:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/rust-clang.html

I chose the Rust vs. C Clang benchmark comparasion, because both rely on the underlying LLVM.

enter image description here

On the other hand, a comparasion with C gcc yields different results:

enter image description here

And guess what? Rust still comes out ahead!

I entreat you to explore the Benchmark Game site in more detail. There are some cases where C will edge out Rust in some instances.

In general, when you are creating a real-world solution, you want to do performance benchmarks for your specific cases. Always do this, because you will often be surprised by the results. Never assume.

I think that too many times, benchmarks are used to forward the "my language is better than your langage" style of rwars. But as one who have used over 20 computer languages throughout his longish career, I always say that it is a matter of the best tool for the job.

answered Oct 18 '22 18:10

Lord Alveric

Related questions
                            
                                Using character literals as terminals in bison
                            
                                Are POSIX file locks reentrant?
                            
                                How to use the condensation algorithm available in OpenCV?
                            
                                best common practice I2C register map
                            
                                `clang -ansi` extensions
                            
                                Is Valgrind's cachegrind affected by multithreaded code?
                            
                                Platform Independent GigE Vision driver [closed]
                            
                                fread stalls on socket but fget doesnt?
                            
                                lightweight cross-platform message queue for IPC [closed]
                            
                                Wait for input for a certain time
                            
                                C RSA key generator
                            
                                C-Style Unions in C#
                            
                                Are the wrapper functions for system calls also called system calls?
                            
                                How to pass a 2d array from Python to C?
                            
                                ARM Cortex-M HardFault exception on writting halfword to flash using C++
                            
                                What is `S_ISREG()`, and what does it do?
                            
                                Storing C objects in R
                            
                                Fork and core dump with threads
                            
                                C read binary stdin
                            
                                How to refresh the screen continuously and update it in real time [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

rust vs c performance

Tags:

performance

c

rust

Rob Latham

People also ask

2 Answers

Rob Latham

Lord Alveric

Recent Activity

Donate For Us