Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collatz conjecture in Rust: functional v imperative approach

I wanted to play around with some good old Collatz conjecture and decided that it would be fun to do it in a (very) functional style, so I implemented an unfoldr function, close to the one Haskell has:

fn unfoldr<F, T>(foo: F, seed: T, mut vec: Vec<T>) -> Vec<T>
    where F: Fn(T) -> Option<(T, T)>
{
    if let Some((x, y)) = foo(seed) {
        vec.push(x);
        unfoldr(foo, y, vec)
    } else {
        vec
    }
}

The rest was pretty straightforward:

fn collatz_next(n: u64) -> u64 {
    if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
}

pub fn collatz_seq_f(n: u64) -> Vec<u64> {
    unfoldr(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, Vec::new())
}

With collatz_seq_f returning a Vector with the sequence starting with a given number n.

I wondered, however, if Rust approves of this style, and implemented an simple imperative counterpart:

pub fn collatz_seq_i(n: u64, mut vec: Vec<u64>) -> Vec<u64> {
    let mut c = n;
    while c != 1 {
        vec.push(c);
        c = collatz_next(c);
    }
    vec
}

And compared them with cargo bench (0.13.0-nightly (2ef3cde 2016-09-04)). I was a bit disappointed that my fun unfoldr approach was only half as fast as the imperative implementation:

running 3 tests
test tests::it_works ... ignored
test tests::bench_collatz_functional ... bench:         900 ns/iter (+/- 47)
test tests::bench_collatz_imperative ... bench:         455 ns/iter (+/- 29)

test result: ok. 0 passed; 0 failed; 1 ignored; 2 measured

I know that the unfoldr version is more abstract, but I didn't expect that much of a difference; is there anything that I could change to make it faster?

Full code below:

#![feature(test)]

extern crate test;

fn unfoldr<F, T>(foo: F, seed: T, mut vec: Vec<T>) -> Vec<T>
    where F: Fn(T) -> Option<(T, T)>
{
    if let Some((x, y)) = foo(seed) {
        vec.push(x);
        unfoldr(foo, y, vec)
    } else {
        vec
    }
}

fn collatz_next(n: u64) -> u64 {
    if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
}

pub fn collatz_seq_f(n: u64) -> Vec<u64> {
    unfoldr(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, Vec::new())
}

pub fn collatz_seq_i(n: u64, mut vec: Vec<u64>) -> Vec<u64> {
    let mut c = n;
    while c != 1 {
        vec.push(c);
        c = collatz_next(c);
    }
    vec
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[test]
    fn it_works() {
        assert_eq!(110, collatz_seq_f(27).len());
        assert_eq!(110, collatz_seq_i(27, Vec::new()).len());
    }

    #[bench]
    fn bench_collatz_functional(b: &mut Bencher) {
        b.iter(|| collatz_seq_f(27));
    }

    #[bench]
    fn bench_collatz_imperative(b: &mut Bencher) {
        b.iter(|| collatz_seq_i(27, Vec::new()));
    }
}
like image 598
ljedrz Avatar asked Sep 05 '16 15:09

ljedrz


2 Answers

This is not an answer, but an additional test to narrow down where the performance hit is coming from. I unrolled the Some overhead by writing the recursive function

pub fn collatz_seq_r(n: u64, mut vec: Vec<u64>) -> Vec<u64> {
    if n == 1 {
        vec
    } else {
        vec.push(n);
        collatz_seq_r(collatz_next(n), vec)
    } 
}

I obtained nearly identical performance as with the collatz_seq_f example. It appears that LLVM is not unrolling this recursive call.

Alternative

After thinking about how I would do this in Rust, I would most likely have implemented an iterator whose job is to continuously compose the previous value with a function, providing a non-terminating sequection: n, f(n), f(f(n)), ..., f^k(n), .... This can be done like this:

struct Compose<T, F> {
    value: T,
    func: F
}

impl<T, F> Iterator for Compose<T, F> 
    where T: Copy,
          F: Fn(T) -> T {
    type Item = T;

    fn next(&mut self) -> Option<T> {
        let res = self.value;                    // f^k(n)
        self.value = (self.func)(self.value);    // f^{k+1}(n)
        Some(res)
    }
}

impl<T, F> Compose<T, F> {
    fn new(seed: T, func: F) -> Compose<T, F> {
        Compose { 
            value: seed,
            func: func
        }
    }
}

So here I can call Compose::new(seed_value, function) to get an iterator of composition. Generating the a Collatz sequence then becomes:

pub fn collatz_seq_iter(n: u64) -> Vec<u64> {
    Compose::new(n, collatz_next)
             .take_while(|&n| n != 1)
             .collect::<Vec<_>>()
}

With this I get the benchmarks:

test tests::bench_collatz_functional ... bench:         867 ns/iter (+/- 28)
test tests::bench_collatz_imperative ... bench:         374 ns/iter (+/- 9)
test tests::bench_collatz_iterators  ... bench:         473 ns/iter (+/- 9)
test tests::bench_collatz_recursive  ... bench:         838 ns/iter (+/- 29)

But the interesting thing here is, if you decide that you only care about the size after all, the call: Compose::new(n, collatz_next).take_while(|&n| n != 1).count() as u64 has almost exactly the same performance as removing the vec.push(c) line in the imperative approach:

test tests::bench_collatz_imperative ... bench:         162 ns/iter (+/- 6)
test tests::bench_collatz_iterators  ... bench:         163 ns/iter (+/- 4)
like image 192
breeden Avatar answered Nov 03 '22 01:11

breeden


This is going to contain some implementation details of why unfoldr is a bit slow.

I proposed a different variant, and @breeden helped me verify that it is an improvement that makes it match the performance imperative variant. It does preserve the recursion, but we can't call it functional anymore.. [^1]

fn unfoldr2<F, T>(foo: F, seed: T, vec: &mut Vec<T>)
    where F: Fn(T) -> Option<(T, T)>
{
    if let Some((x, y)) = foo(seed) {
        vec.push(x);
        unfoldr2(foo, y, vec)
    }
}

fn collatz_next(n: u64) -> u64 {
    if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
}

pub fn collatz_seq_f(n: u64) -> Vec<u64> {
    let mut v = Vec::new();
    unfoldr2(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, &mut v);
    v
}

The difference here will illustrate what “went wrong” with the first version. In unfoldr, there is a vec value being carried around, and in unfoldr2 there is just a mutable reference to a vector instead.

The vec value has an effect in unfoldr and you discovered it limited the compiler: unwinding. Unwinding is what happens if a function panics. If it unwinds through the unfoldr function, all local variables must be dropped, and that means vec. Some special code is inserted to deal with this (called a “landing pad”) and function calls that may panic insert an instruction to divert to the landing pad on panic.

So in unfoldr:

  1. There's a local variable that has a destructor, vec
  2. There's a function call that may panic (vec.push panics on capacity overflow)
  3. There's a landing pad that drops vec and resumes unwinding

Additionally, there's code to move the Vec value around. (It is copied to the stack to be available for the landing pad code).

unfoldr2 doesn't get any magic recursion-into-just-a-loop optimization or so, but it still has less code because it has no need to handle unwinding or move the Vec.

[^1]: Can we salvage the functional-ness by imagining the vec.push(x) as being an interface to a stream / generator / outputiterator, or just be a callback?

like image 40
bluss Avatar answered Nov 03 '22 02:11

bluss