Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In OCaml, how big is the price of abstraction (i.e. polymorphic functions)

I'm still in the early phase of learning OCaml and am curious to know what the best way to extract maximum performance out of generic code in OCaml is.

As a small experiment, I've written two polymorphic functions: one in C++ and the other in OCaml that find the largest element in a given array.

What I've observed is that while in C++ you pay no penalty for this sort of abstraction, the penalty in OCaml is a whopping one degree of magnitude drop in performance. And as an aside, the C++ solution I quickly concocted is more generic that the OCaml one, but I blame that primarily on my inexperience with the language.

My question is the following: how to write and use polymorphic functions in OCaml without paying the huge performance penalty that I've just observed?

Another thing I observed for this particular problem is that my functional solution in OCaml was slower than the imperative one, whereas the "functional" solution in C++ suffered no penalties compared to the analogue imperative approach.

C++ code was compiled with g++ -std="c++0x" -O3 -o max_cpp max.cpp, using gcc-4.6.3 and OCaml code with: ocamlopt -o max_ml max.ml using ocamlopt version 3.12.1. Both of the generated executables were 32 bit and run on Ubuntu x64 11.04

I submit the code of both programs.

C++ code (not entirely safe, of course ;) )

#include <iostream>
#include <vector>
#include <numeric>
template <typename T> T max (T a, T b) {
     return (a>b)? a : b;
}

template <typename I> typename I::value_type find_max (I begin, I end) {
    auto max = *begin;
    for (begin++;begin!=end;begin++)
            if (*begin > max) max = *begin;
    return max;
}

template <typename I> typename I::value_type find_max1(I begin, I end) {
    return std::accumulate(begin, end, *begin, max< typename I::value_type> );
}

int main(int argc, char ** argv) {
    const size_t nElem = atoi(argv[1]);
    const size_t nIter = atoi(argv[2]);
    std::vector<int> intA(nElem);
    std::vector<double> dubA(nElem);
    for (int i=0;i<nIter;i++) {
            auto res1 = find_max(intA.begin(), intA.end());
            auto res2 = find_max(dubA.begin(), dubA.end());
            std::cout << "Max int: " << res1 << " " << "Max dub: " << res2 << std::endl;
    }
}

OCaml code

let max a b = if a > b then a else b

(* functional version *)
let find_max vector =
    Array.fold_right max vector vector.(0)

(* imperative version *)
let find_max1 vector =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

let nElems = int_of_string Sys.argv.(1)

let nIter  = int_of_string Sys.argv.(2)

let _ = let intA = Array.make nElems 0 in
    let dubA = Array.make nElems 0.0 in
    for i=1 to nIter do
            let res1 = find_max intA in
            let res2 = find_max dubA in
            print_int !res1;
            print_newline ();
            print_float !res2;
    done

However, if I "overload" the function to discriminate ints and floats then the performance of the program even exceeds that of my C++ code by 50%! I wonder why though.

let find_max_int (vector : int array) =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

let find_max_float (vector : float array) =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

The timings were performed rather crudely with: time ./max_cpp 1000000 100 and time ./max_ml 1000000 100

like image 314
PetarMarendic Avatar asked Dec 19 '12 16:12

PetarMarendic


People also ask

Does OCaml allow polymorphic functions?

In OCaml, higher-order and polymorphic functions are powerful tools for code reuse. Higher-order functions are those functions that take other functions as arguments or return functions as results, while polymorphic functions are functions that act over values with many different types.

What is a zero cost abstraction?

Zero Cost Abstractions means adding higher-level programming concepts, like generics, collections and so on do not come with a run-time cost, only compile time cost (the code will be slower to compile).

Does OCaml support parametric polymorphism?

OCaml also supports row polymorphism, which is a form of parametric polymorphism with constraints.

What is parametric polymorphism OCaml?

Parametric polymorphism is a type that is parameterized by other types. For example, the identity function in OCaml: let id (x : 'a) : 'a = x (* id : 'a -> 'a *) assert (id 3 = 3) assert (id "hello" = "hello") The 'a , which refers to the Greek letter α, is a type parameter meaning any type can be substituted there.


2 Answers

In OCaml, the comparison operator (<) is a generic function of type 'a -> 'a -> bool (likewise for equality, etc.). This means that it is implemented by a special primitive in the runtime system that effectively run ad-hoc comparison on data structure of any type. The type-checker is able to optimize monomorphic comparisons on integers and floats into the specialized comparison operation, instead of doing the type check at runtime in the polymorphic case. A tenfold difference in efficiency is not surprising if you test only this operation.

Note that to get maximal flexibility you should abstract on the comparison operation in find_max. This would however prevent the monomorphization optimization, so an inlined version will still perform better.

I think your approach (doing micro-benchmarks and hoping to learn interesting things about the performance of real programs) is flawed. You ran into a highly specific case of OCaml performance behavior and wrongly concluded from there that the performance of polymorphic functions "is poor". This is clearly a bad conclusion out of premature optimization. Write an actually representative example of the computations you intend to run, and then reason on the performance of this real-world piece of code. You will learn very little truth from micro-benchmarks of this sort, and a lot of irrelevant details.

(It is true however that C++ code specialization approach can produce more efficient code in general than the OCaml compilation technique, that compiles only one version of the function for all types but has to make corresponding data representation compromises; in OCaml there can be an overhead related to polymorphism. However, that is observable only in fairly specific cases and you can often just make a specific inlining pass in the small critical section of your code. What you gain in return is much faster compilation (no duplication) and actually readable error messages -- like you might have if concepts were integrated in C++.)

Edit: I was in fact wrong in saying that abstracting on the comparison would prevent the type-directed optimization. The following code, while still not as fast as the inlined-by-hand version, is still noticeably faster than the version using polymorphic comparison:

let find_max1 comp vector =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
      if comp !max vector.(i) then max := vector.(i)
    done;
    !max

let find_max_int (vector : int array) = find_max1 (fun x y -> x < y) vector
let find_max_float (vector : float array) = find_max1 (fun x y -> x < y) vector
like image 199
gasche Avatar answered Nov 04 '22 05:11

gasche


In fact you are comparing compile-time specialized functions vs runtime-dispatched. More adequate code on ocaml side would be using functors, that theoretically could reduce the number of indirect calls to zero, but I guess it will still suffer from underoptimization. Another problem is that data representation is uniform and not specialized/inlined for user type in any case.

like image 26
ygrek Avatar answered Nov 04 '22 04:11

ygrek