I wanted to play around with some good old Collatz conjecture and decided that it would be fun to do it in a (very) functional style, so I implemented an unfoldr
function, close to the one Haskell has:
fn unfoldr<F, T>(foo: F, seed: T, mut vec: Vec<T>) -> Vec<T>
where F: Fn(T) -> Option<(T, T)>
{
if let Some((x, y)) = foo(seed) {
vec.push(x);
unfoldr(foo, y, vec)
} else {
vec
}
}
The rest was pretty straightforward:
fn collatz_next(n: u64) -> u64 {
if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
}
pub fn collatz_seq_f(n: u64) -> Vec<u64> {
unfoldr(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, Vec::new())
}
With collatz_seq_f
returning a Vec
tor with the sequence starting with a given number n
.
I wondered, however, if Rust approves of this style, and implemented an simple imperative counterpart:
pub fn collatz_seq_i(n: u64, mut vec: Vec<u64>) -> Vec<u64> {
let mut c = n;
while c != 1 {
vec.push(c);
c = collatz_next(c);
}
vec
}
And compared them with cargo bench
(0.13.0-nightly (2ef3cde 2016-09-04)). I was a bit disappointed that my fun unfoldr
approach was only half as fast as the imperative implementation:
running 3 tests
test tests::it_works ... ignored
test tests::bench_collatz_functional ... bench: 900 ns/iter (+/- 47)
test tests::bench_collatz_imperative ... bench: 455 ns/iter (+/- 29)
test result: ok. 0 passed; 0 failed; 1 ignored; 2 measured
I know that the unfoldr
version is more abstract, but I didn't expect that much of a difference; is there anything that I could change to make it faster?
Full code below:
#![feature(test)]
extern crate test;
fn unfoldr<F, T>(foo: F, seed: T, mut vec: Vec<T>) -> Vec<T>
where F: Fn(T) -> Option<(T, T)>
{
if let Some((x, y)) = foo(seed) {
vec.push(x);
unfoldr(foo, y, vec)
} else {
vec
}
}
fn collatz_next(n: u64) -> u64 {
if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
}
pub fn collatz_seq_f(n: u64) -> Vec<u64> {
unfoldr(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, Vec::new())
}
pub fn collatz_seq_i(n: u64, mut vec: Vec<u64>) -> Vec<u64> {
let mut c = n;
while c != 1 {
vec.push(c);
c = collatz_next(c);
}
vec
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[test]
fn it_works() {
assert_eq!(110, collatz_seq_f(27).len());
assert_eq!(110, collatz_seq_i(27, Vec::new()).len());
}
#[bench]
fn bench_collatz_functional(b: &mut Bencher) {
b.iter(|| collatz_seq_f(27));
}
#[bench]
fn bench_collatz_imperative(b: &mut Bencher) {
b.iter(|| collatz_seq_i(27, Vec::new()));
}
}
This is not an answer, but an additional test to narrow down where the performance hit is coming from. I unrolled the Some
overhead by writing the recursive function
pub fn collatz_seq_r(n: u64, mut vec: Vec<u64>) -> Vec<u64> {
if n == 1 {
vec
} else {
vec.push(n);
collatz_seq_r(collatz_next(n), vec)
}
}
I obtained nearly identical performance as with the collatz_seq_f
example. It appears that LLVM is not unrolling this recursive call.
After thinking about how I would do this in Rust, I would most likely have implemented an iterator whose job is to continuously compose the previous value with a function, providing a non-terminating sequection: n, f(n), f(f(n)), ..., f^k(n), ...
. This can be done like this:
struct Compose<T, F> {
value: T,
func: F
}
impl<T, F> Iterator for Compose<T, F>
where T: Copy,
F: Fn(T) -> T {
type Item = T;
fn next(&mut self) -> Option<T> {
let res = self.value; // f^k(n)
self.value = (self.func)(self.value); // f^{k+1}(n)
Some(res)
}
}
impl<T, F> Compose<T, F> {
fn new(seed: T, func: F) -> Compose<T, F> {
Compose {
value: seed,
func: func
}
}
}
So here I can call Compose::new(seed_value, function)
to get an iterator of composition. Generating the a Collatz sequence then becomes:
pub fn collatz_seq_iter(n: u64) -> Vec<u64> {
Compose::new(n, collatz_next)
.take_while(|&n| n != 1)
.collect::<Vec<_>>()
}
With this I get the benchmarks:
test tests::bench_collatz_functional ... bench: 867 ns/iter (+/- 28)
test tests::bench_collatz_imperative ... bench: 374 ns/iter (+/- 9)
test tests::bench_collatz_iterators ... bench: 473 ns/iter (+/- 9)
test tests::bench_collatz_recursive ... bench: 838 ns/iter (+/- 29)
But the interesting thing here is, if you decide that you only care about the size after all, the call: Compose::new(n, collatz_next).take_while(|&n| n != 1).count() as u64
has almost exactly the same performance as removing the vec.push(c)
line in the imperative approach:
test tests::bench_collatz_imperative ... bench: 162 ns/iter (+/- 6)
test tests::bench_collatz_iterators ... bench: 163 ns/iter (+/- 4)
This is going to contain some implementation details of why unfoldr
is a bit slow.
I proposed a different variant, and @breeden helped me verify that it is an improvement that makes it match the performance imperative variant. It does preserve the recursion, but we can't call it functional anymore.. [^1]
fn unfoldr2<F, T>(foo: F, seed: T, vec: &mut Vec<T>)
where F: Fn(T) -> Option<(T, T)>
{
if let Some((x, y)) = foo(seed) {
vec.push(x);
unfoldr2(foo, y, vec)
}
}
fn collatz_next(n: u64) -> u64 {
if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
}
pub fn collatz_seq_f(n: u64) -> Vec<u64> {
let mut v = Vec::new();
unfoldr2(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, &mut v);
v
}
The difference here will illustrate what “went wrong” with the first version. In unfoldr
, there is a vec value being carried around, and in unfoldr2
there is just a mutable reference to a vector instead.
The vec value has an effect in unfoldr
and you discovered it limited the compiler: unwinding. Unwinding is what happens if a function panics. If it unwinds through the unfoldr
function, all local variables must be dropped, and that means vec
. Some special code is inserted to deal with this (called a “landing pad”) and function calls that may panic insert an instruction to divert to the landing pad on panic.
So in unfoldr
:
vec
vec.push
panics on capacity overflow)vec
and resumes unwindingAdditionally, there's code to move the Vec value around. (It is copied to the stack to be available for the landing pad code).
unfoldr2
doesn't get any magic recursion-into-just-a-loop optimization or so, but it still has less code because it has no need to handle unwinding or move the Vec.
[^1]: Can we salvage the functional-ness by imagining the vec.push(x) as being an interface to a stream / generator / outputiterator, or just be a callback?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With