I am following the Rust track on Exercism.io. I have a fair amount of C/C++ experience. I like the 'functional' elements of Rust but I'm concerned about the relative performance. I solved the 'run length encoding' problem: <pre class="prettyprint lang-rust prettyprint-override"><code>pub fn encode(source: &str) -> String { let mut retval = String::new(); let firstchar = source.chars().next(); let mut currentchar = match firstchar { Some(x) => x, None => return retval, }; let mut currentcharcount: u32 = 0; for c in source.chars() { if c == currentchar { currentcharcount += 1; } else { if currentcharcount > 1 { retval.push_str(&currentcharcount.to_string()); } retval.push(currentchar); currentchar = c; currentcharcount = 1; } } if currentcharcount > 1 { retval.push_str(&currentcharcount.to_string()); } retval.push(currentchar); retval } </code></pre> I noticed that one of the top-rated answers looked more like this: <pre class="prettyprint lang-rust prettyprint-override"><code>extern crate itertools; use itertools::Itertools; pub fn encode(data: &str) -> String { data.chars() .group_by(|&c| c) .into_iter() .map(|(c, group)| match group.count() { 1 => c.to_string(), n => format!("{}{}", n, c), }) .collect() } </code></pre> I love the top rated solution; it is simple, functional, and elegant. This is what they promised me Rust would be all about. Mine on the other hand is gross and full of mutable variables. You can tell I'm used to C++. My problem is that the functional style has a SIGNIFICANT performance impact. I tested both versions with the same 4MB of random data encoded 1000 times. My imperative solution took under 10 seconds; the functional solution was ~2mins30seconds. <ul> <li>Why is the functional style so much slower than the imperative style?</li> <li>Is there some problem with the functional implementation which is causing such a huge slowdown?</li> <li>If I want to write high performance code, should I ever use this functional style? </li> </ul>

TL;DR A functional implementation can be faster than your original procedural implementation, in certain cases. <blockquote> Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown? </blockquote> As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter. I see two main issues with the functional code: <ul> <li>Allocating numerous strings over and over is inefficient. In the original functional implementation, this is done via <code>to_string</code> and <code>format!</code>.</li> <li>There's the overhead of using <code>group_by</code>, which exists to give a nested iterator, which you don't need just to get the counts.</li> </ul> Using more of itertools (<code>batching</code>, <code>take_while_ref</code>, <code>format_with</code>) brings the two implementations much closer: <pre class="prettyprint"><code>pub fn encode_slim(data: &str) -> String { data.chars() .batching(|it| { it.next() .map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1)) }) .format_with("", |(c, count), f| match count { 1 => f(&c), n => f(&format_args!("{}{}", n, c)), }) .to_string() } </code></pre> A benchmark of 4MiB of random alphanumeric data, compiled with <code>RUSTFLAGS='-C target-cpu=native'</code>: <pre class="prettyprint lang-none prettyprint-override"><code>encode (procedural) time: [21.082 ms 21.620 ms 22.211 ms] encode (fast) time: [26.457 ms 27.104 ms 27.882 ms] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe </code></pre> If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code: <pre class="prettyprint"><code>struct RunLength { iter: I, saved: Option<char>, } impl RunLength where I: Iterator<Item = char>, { fn new(mut iter: I) -> Self { let saved = iter.next(); // See footnote 1 Self { iter, saved } } } impl Iterator for RunLength where I: Iterator<Item = char>, { type Item = (char, usize); fn next(&mut self) -> Option<Self::Item> { let c = self.saved.take().or_else(|| self.iter.next())?; let mut count = 1; while let Some(n) = self.iter.next() { if n == c { count += 1 } else { self.saved = Some(n); break; } } Some((c, count)) } } pub fn encode_tiny(data: &str) -> String { use std::fmt::Write; RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| { match count { 1 => s.push(c), n => write!(&mut s, "{}{}", n, c).unwrap(), } s }) } </code></pre> 1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction. A benchmark of 4MiB of random alphanumeric data, compiled with <code>RUSTFLAGS='-C target-cpu=native'</code>: <pre class="prettyprint lang-none prettyprint-override"><code>encode (procedural) time: [19.888 ms 20.301 ms 20.794 ms] Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe encode (tiny) time: [19.150 ms 19.262 ms 19.399 ms] Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe </code></pre> I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call <code>next</code>, we need to see if there was a previous character that we've read (<code>self.saved</code>). This adds a branch to the code that isn't there in the procedural code. On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a <code>String</code>, etc. The custom iterator can be extended to operate on a generic type instead of <code>char</code> as well, making it very flexible. See also: <ul> <li>How can I add new methods to Iterator?</li> </ul> <blockquote> If I want to write high performance code, should I ever use this functional style? </blockquote> I would, until benchmarking shows that it's the bottleneck. Then evaluate why it's the bottleneck. <h3>Supporting code</h3> Always got to show your work, right? benchmark.rs <pre class="prettyprint"><code>use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11 use rle::*; fn criterion_benchmark(c: &mut Criterion) { let data = rand_data(4 * 1024 * 1024); c.bench_function("encode (procedural)", { let data = data.clone(); move |b| b.iter(|| encode_proc(&data)) }); c.bench_function("encode (functional)", { let data = data.clone(); move |b| b.iter(|| encode_iter(&data)) }); c.bench_function("encode (fast)", { let data = data.clone(); move |b| b.iter(|| encode_slim(&data)) }); c.bench_function("encode (tiny)", { let data = data.clone(); move |b| b.iter(|| encode_tiny(&data)) }); } criterion_group!(benches, criterion_benchmark); criterion_main!(benches); </code></pre> lib.rs <pre class="prettyprint"><code>use itertools::Itertools; // 0.8.0 use rand; // 0.6.5 pub fn rand_data(len: usize) -> String { use rand::distributions::{Alphanumeric, Distribution}; let mut rng = rand::thread_rng(); Alphanumeric.sample_iter(&mut rng).take(len).collect() } pub fn encode_proc(source: &str) -> String { let mut retval = String::new(); let firstchar = source.chars().next(); let mut currentchar = match firstchar { Some(x) => x, None => return retval, }; let mut currentcharcount: u32 = 0; for c in source.chars() { if c == currentchar { currentcharcount += 1; } else { if currentcharcount > 1 { retval.push_str(&currentcharcount.to_string()); } retval.push(currentchar); currentchar = c; currentcharcount = 1; } } if currentcharcount > 1 { retval.push_str(&currentcharcount.to_string()); } retval.push(currentchar); retval } pub fn encode_iter(data: &str) -> String { data.chars() .group_by(|&c| c) .into_iter() .map(|(c, group)| match group.count() { 1 => c.to_string(), n => format!("{}{}", n, c), }) .collect() } pub fn encode_slim(data: &str) -> String { data.chars() .batching(|it| { it.next() .map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1)) }) .format_with("", |(c, count), f| match count { 1 => f(&c), n => f(&format_args!("{}{}", n, c)), }) .to_string() } struct RunLength { iter: I, saved: Option<char>, } impl RunLength where I: Iterator<Item = char>, { fn new(mut iter: I) -> Self { let saved = iter.next(); Self { iter, saved } } } impl Iterator for RunLength where I: Iterator<Item = char>, { type Item = (char, usize); fn next(&mut self) -> Option<Self::Item> { let c = self.saved.take().or_else(|| self.iter.next())?; let mut count = 1; while let Some(n) = self.iter.next() { if n == c { count += 1 } else { self.saved = Some(n); break; } } Some((c, count)) } } pub fn encode_tiny(data: &str) -> String { use std::fmt::Write; RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| { match count { 1 => s.push(c), n => write!(&mut s, "{}{}", n, c).unwrap(), } s }) } #[cfg(test)] mod test { use super::*; #[test] fn all_the_same() { let data = rand_data(1024); let a = encode_proc(&data); let b = encode_iter(&data); let c = encode_slim(&data); let d = encode_tiny(&data); assert_eq!(a, b); assert_eq!(a, c); assert_eq!(a, d); } } </code></pre>

Let's review the functional implementation! <h3>Memory Allocations</h3> One of the big issues of the functional style proposed here is the closure passed to the <code>map</code> method which allocates a lot. Every single character is first mapped to a <code>String</code> before being collected. It also uses the <code>format</code> machinery, which is known to be relatively slow. Sometimes, people try way too hard to get a "pure" functional solution, instead: <pre class="prettyprint"><code>let mut result = String::new(); for (c, group) in &source.chars().group_by(|&c| c) { let count = group.count(); if count > 1 { result.push_str(&count.to_string()); } result.push(c); } </code></pre> is about as verbose, yet only allocates when <code>count > 1</code> just like your solution does and does not use the <code>format</code> machinery either. I would expect a significant performance win compared to the full functional solution, while at the same time still leveraging <code>group_by</code> for extra readability compared to the full imperative solution. Sometimes, you ought to mix and match!

What are the performance impacts of 'functional' Rust?

Tags:

functional-programming

imperative-programming

rust

I am following the Rust track on Exercism.io. I have a fair amount of C/C++ experience. I like the 'functional' elements of Rust but I'm concerned about the relative performance.

I solved the 'run length encoding' problem:

pub fn encode(source: &str) -> String {     let mut retval = String::new();     let firstchar = source.chars().next();     let mut currentchar = match firstchar {         Some(x) => x,         None => return retval,     };     let mut currentcharcount: u32 = 0;     for c in source.chars() {         if c == currentchar {             currentcharcount += 1;         } else {             if currentcharcount > 1 {                 retval.push_str(&currentcharcount.to_string());             }             retval.push(currentchar);             currentchar = c;             currentcharcount = 1;         }     }     if currentcharcount > 1 {         retval.push_str(&currentcharcount.to_string());     }     retval.push(currentchar);     retval }

I noticed that one of the top-rated answers looked more like this:

extern crate itertools;  use itertools::Itertools;  pub fn encode(data: &str) -> String {     data.chars()         .group_by(|&c| c)         .into_iter()         .map(|(c, group)| match group.count() {             1 => c.to_string(),             n => format!("{}{}", n, c),         })         .collect() }

I love the top rated solution; it is simple, functional, and elegant. This is what they promised me Rust would be all about. Mine on the other hand is gross and full of mutable variables. You can tell I'm used to C++.

My problem is that the functional style has a SIGNIFICANT performance impact. I tested both versions with the same 4MB of random data encoded 1000 times. My imperative solution took under 10 seconds; the functional solution was ~2mins30seconds.

Why is the functional style so much slower than the imperative style?
Is there some problem with the functional implementation which is causing such a huge slowdown?
If I want to write high performance code, should I ever use this functional style?

325

asked Apr 14 '19 12:04

David Copernicus Bowie

2 Answers

TL;DR

A functional implementation can be faster than your original procedural implementation, in certain cases.

Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown?

As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter.

I see two main issues with the functional code:

Allocating numerous strings over and over is inefficient. In the original functional implementation, this is done via to_string and format!.
There's the overhead of using group_by, which exists to give a nested iterator, which you don't need just to get the counts.

Using more of itertools (batching, take_while_ref, format_with) brings the two implementations much closer:

pub fn encode_slim(data: &str) -> String {     data.chars()         .batching(|it| {             it.next()                 .map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))         })         .format_with("", |(c, count), f| match count {             1 => f(&c),             n => f(&format_args!("{}{}", n, c)),         })         .to_string() }

A benchmark of 4MiB of random alphanumeric data, compiled with RUSTFLAGS='-C target-cpu=native':

encode (procedural)     time:   [21.082 ms 21.620 ms 22.211 ms]  encode (fast)           time:   [26.457 ms 27.104 ms 27.882 ms] Found 7 outliers among 100 measurements (7.00%)   4 (4.00%) high mild   3 (3.00%) high severe

If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code:

struct RunLength<I> {     iter: I,     saved: Option<char>, }  impl<I> RunLength<I> where     I: Iterator<Item = char>, {     fn new(mut iter: I) -> Self {         let saved = iter.next(); // See footnote 1         Self { iter, saved }     } }  impl<I> Iterator for RunLength<I> where     I: Iterator<Item = char>, {     type Item = (char, usize);      fn next(&mut self) -> Option<Self::Item> {         let c = self.saved.take().or_else(|| self.iter.next())?;          let mut count = 1;         while let Some(n) = self.iter.next() {             if n == c {                 count += 1             } else {                 self.saved = Some(n);                 break;             }         }          Some((c, count))     } }  pub fn encode_tiny(data: &str) -> String {     use std::fmt::Write;      RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {         match count {             1 => s.push(c),             n => write!(&mut s, "{}{}", n, c).unwrap(),         }         s     }) }

1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction.

A benchmark of 4MiB of random alphanumeric data, compiled with RUSTFLAGS='-C target-cpu=native':

encode (procedural)     time:   [19.888 ms 20.301 ms 20.794 ms] Found 4 outliers among 100 measurements (4.00%)   3 (3.00%) high mild   1 (1.00%) high severe  encode (tiny)           time:   [19.150 ms 19.262 ms 19.399 ms] Found 11 outliers among 100 measurements (11.00%)   5 (5.00%) high mild   6 (6.00%) high severe

I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call next, we need to see if there was a previous character that we've read (self.saved). This adds a branch to the code that isn't there in the procedural code.

On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a String, etc. The custom iterator can be extended to operate on a generic type instead of char as well, making it very flexible.

Supporting code

Always got to show your work, right?

benchmark.rs

use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11 use rle::*;  fn criterion_benchmark(c: &mut Criterion) {     let data = rand_data(4 * 1024 * 1024);      c.bench_function("encode (procedural)", {         let data = data.clone();         move |b| b.iter(|| encode_proc(&data))     });      c.bench_function("encode (functional)", {         let data = data.clone();         move |b| b.iter(|| encode_iter(&data))     });      c.bench_function("encode (fast)", {         let data = data.clone();         move |b| b.iter(|| encode_slim(&data))     });      c.bench_function("encode (tiny)", {         let data = data.clone();         move |b| b.iter(|| encode_tiny(&data))     }); }  criterion_group!(benches, criterion_benchmark); criterion_main!(benches);

lib.rs

use itertools::Itertools; // 0.8.0 use rand; // 0.6.5  pub fn rand_data(len: usize) -> String {     use rand::distributions::{Alphanumeric, Distribution};     let mut rng = rand::thread_rng();     Alphanumeric.sample_iter(&mut rng).take(len).collect() }  pub fn encode_proc(source: &str) -> String {     let mut retval = String::new();     let firstchar = source.chars().next();     let mut currentchar = match firstchar {         Some(x) => x,         None => return retval,     };     let mut currentcharcount: u32 = 0;     for c in source.chars() {         if c == currentchar {             currentcharcount += 1;         } else {             if currentcharcount > 1 {                 retval.push_str(&currentcharcount.to_string());             }             retval.push(currentchar);             currentchar = c;             currentcharcount = 1;         }     }     if currentcharcount > 1 {         retval.push_str(&currentcharcount.to_string());     }     retval.push(currentchar);     retval }  pub fn encode_iter(data: &str) -> String {     data.chars()         .group_by(|&c| c)         .into_iter()         .map(|(c, group)| match group.count() {             1 => c.to_string(),             n => format!("{}{}", n, c),         })         .collect() }  pub fn encode_slim(data: &str) -> String {     data.chars()         .batching(|it| {             it.next()                 .map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))         })         .format_with("", |(c, count), f| match count {             1 => f(&c),             n => f(&format_args!("{}{}", n, c)),         })         .to_string() }  struct RunLength<I> {     iter: I,     saved: Option<char>, }  impl<I> RunLength<I> where     I: Iterator<Item = char>, {     fn new(mut iter: I) -> Self {         let saved = iter.next();         Self { iter, saved }     } }  impl<I> Iterator for RunLength<I> where     I: Iterator<Item = char>, {     type Item = (char, usize);      fn next(&mut self) -> Option<Self::Item> {         let c = self.saved.take().or_else(|| self.iter.next())?;          let mut count = 1;         while let Some(n) = self.iter.next() {             if n == c {                 count += 1             } else {                 self.saved = Some(n);                 break;             }         }          Some((c, count))     } }  pub fn encode_tiny(data: &str) -> String {     use std::fmt::Write;      RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {         match count {             1 => s.push(c),             n => write!(&mut s, "{}{}", n, c).unwrap(),         }         s     }) }  #[cfg(test)] mod test {     use super::*;      #[test]     fn all_the_same() {         let data = rand_data(1024);          let a = encode_proc(&data);         let b = encode_iter(&data);         let c = encode_slim(&data);         let d = encode_tiny(&data);          assert_eq!(a, b);         assert_eq!(a, c);         assert_eq!(a, d);     } }

195

answered Sep 18 '22 20:09

Shepmaster

Let's review the functional implementation!

Memory Allocations

One of the big issues of the functional style proposed here is the closure passed to the map method which allocates a lot. Every single character is first mapped to a String before being collected.

It also uses the format machinery, which is known to be relatively slow.

Sometimes, people try way too hard to get a "pure" functional solution, instead:

let mut result = String::new(); for (c, group) in &source.chars().group_by(|&c| c) {     let count = group.count();     if count > 1 {         result.push_str(&count.to_string());     }      result.push(c); }

is about as verbose, yet only allocates when count > 1 just like your solution does and does not use the format machinery either.

I would expect a significant performance win compared to the full functional solution, while at the same time still leveraging group_by for extra readability compared to the full imperative solution. Sometimes, you ought to mix and match!

answered Sep 20 '22 20:09

Matthieu M.

Related questions
                            
                                Is Haskell really a purely functional language considering unsafePerformIO?
                            
                                Kotlin - how to find number of repeated values in a list?
                            
                                How do I implement graphs and graph algorithms in a functional programming language?
                            
                                Python alternative to reduce()
                            
                                Explanation of “tying the knot”
                            
                                Counting number of elements in a list that satisfy the given predicate
                            
                                Why don't purely functional languages use reference counting?
                            
                                Get first element of a collection that matches iterator function
                            
                                Immutable object pattern in C# - what do you think? [closed]
                            
                                Sorting ArrayList with Lambda in Java 8
                            
                                Should do-notation be avoided in Haskell?
                            
                                Which GoF Design pattern will be changed or influenced by the introduction of lambdas in Java8?
                            
                                How are Dynamic Programming algorithms implemented in idiomatic Haskell?
                            
                                Is there a Haskell idiom for updating a nested data structure?
                            
                                How to get F# working with Mono?
                            
                                Does Scala have guards?
                            
                                Why is foldl defined in a strange way in Racket?
                            
                                Using Function.prototype.bind with an array of arguments?
                            
                                Haskell vs. procedural programming in the real world [closed]
                            
                                Good introduction to free theorems [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With