Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Functional Programming Performance

I've recently started to use Scala to solve some programming challenges at Codeforces in order to exercise functional programming skills. Doing so I came across one particular challenge I wasn't able to solve in a way respecting the given execution time limit of 1000ms; the Painting Fence problem.

I tried various different ways, starting with a straight forward recursive solution, trying a similar approach using streams instead of lists, and eventually trying to reduce list manipulations by working a bit more with indices. I ended up having stack overflow exceptions on larger tests that I was able to fix using Scala's TailCall. But still, while the solution correctly solves the problem, it is too slow to complete within 1000ms. In addition to that, there's a C++ implementation shown that is ridiculously fast in comparison (< 50ms). Now I do understand that Scala will be slower compared to C++ in many cases, and I also understand that I could write a more imperative-style solution in Scala that would probably perform a lot better. Still, I'm wondering if I am missing something more fundamental here, because I have a hard time believing that functional programming is that much slower in general (and I am pretty new to functional programming).

Here is my scala code that you can paste in the REPL including the example that takes >1000ms:

import scala.util.control.TailCalls._

def solve(l: List[(Int, Int)]): Int = {

  def go(from: Int, to: Int, prevHeight: Int): TailRec[Int] = {
    val max = to - from
    val currHeight = l.slice(from, to).minBy(_._1)._1
    val hStrokes = currHeight - prevHeight
    val splits = l.slice(from, to).filter(_._1 - currHeight == 0).map(_._2)
    val indices = from :: splits.flatMap(x => List(x, x+1)) ::: List(to)
    val subLists = indices.grouped(2).filter(xs => xs.last - xs.head > 0)

    val trampolines = subLists.map(xs => tailcall(go(xs.head, xs.last, currHeight)))
    val sumTrampolines = trampolines.foldLeft(done(hStrokes))((b, a) => b.flatMap(bVal =>
      a.map(aVal => aVal + bVal)))
    sumTrampolines.flatMap(v => done(max).map(m => Math.min(m, v)))
  }
  go(0, l.size, 0).result
}

val lst = (1 to 5000).toList.zipWithIndex
val res = solve(lst)

And for comparison, here's a C++ example achieving the same thing written by Bugman (includes some read/write from console that I didn't include in the Scala version above):

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <string>
#include <set>
#include <map>
#include <cmath>
#include <memory.h>
using namespace std;
typedef long long ll;

const int N = 1e6+6;
const int T = 1e6+6;

int a[N];
int t[T], d;

int rmq(int i, int j){
    int r = i;
    for(i+=d,j+=d; i<=j; ++i>>=1,--j>>=1){
        if(i&1) r=a[r]>a[t[i]]?t[i]:r;
        if(~j&1) r=a[r]>a[t[j]]?t[j]:r;
    }
    return r;
}

int calc(int l, int r, int h){
    if(l>r) return 0;

    int m = rmq(l,r);
    int mn = a[m];
    int res = min(r-l+1, calc(l,m-1,mn)+calc(m+1,r,mn)+mn-h);
    return res;
}

int main(){
    //freopen("input.txt","r",stdin);// freopen("output.txt","w",stdout);

    int n, m;

    scanf("%d",&n);
    for(int i=0;i<n;++i) scanf("%d",&a[i]);

    a[n] = 2e9;
    for(d=1;d<n;d<<=1);
    for(int i=0;i<n;++i) t[i+d]=i;
    for(int i=n+d;i<d+d;++i) t[i]=n;
    for(int i=d-1;i;--i) t[i]=a[t[i*2]]<a[t[i*2+1]]?t[i*2]:t[i*2+1];

    printf("%d\n",calc(0,n-1,0));

    return 0;
}

At least before I introduced the explicit tail calls, the more functional style seemed more natural to me to solve the problem than a more imperative solution. So I'd be really happy to learn more on what I should be attentive of when writing functional code in order to still get acceptable performance.

like image 252
Marc Avatar asked Sep 30 '22 06:09

Marc


1 Answers

Relying so heavily on indices is arguably not really idiomatic functional style, and combining indexing and lists is a recipe for less-than-ideal performance.

Here's an index-free implementation:

import scala.util.control.TailCalls._

def solve(xs: Vector[Int]): Int = {
  def go(xs: Vector[Int], previous: Int): TailRec[Int] = {
    val min = xs.min

    splitOn(xs, min).foldLeft(done(min - previous)) {
      case (acc, part) => for {
        total <- acc
        cost  <- go(part, min)
      } yield total + cost
    }.map(math.min(xs.size, _))
  }

  go(xs, 0).result
}

That's not quite the whole story, though—I've factored out the splitting part into a method named splitOn that takes a sequence and a delimiter. Because this is a very simple and generic operation, it's a good candidate for optimization. The following is a quick attempt:

def splitOn[A](xs: Vector[A], delim: A): Vector[Vector[A]] = {
  val builder = Vector.newBuilder[Vector[A]]
  var i = 0
  var start = 0

  while (i < xs.size) {
    if (xs(i) == delim) {
      if (i != start) {
        builder += xs.slice(start, i)
      }
      start = i + 1
    }
    i += 1
  }

  if (i != start) builder += xs.slice(start, i)

  builder.result
}

While this implementation is imperative, from the outside the method is perfectly functional—it doesn't have side effects, etc.

This is often a good way to improve the performance of functional code: we've separated our program into a generic piece (splitting a list on a delimiter) and our problem-specific logic. Because the former is so simple, we can ask treat it (and test it) as a black box, while keeping the code we're using to solve the problem clean and functional.

In this case the performance still isn't great—this implementation is about twice as fast as yours on my machine—but I don't think you're going to get much better than that while trampolining with TailCalls.

like image 81
Travis Brown Avatar answered Oct 04 '22 18:10

Travis Brown