In OCaml, how big is the price of abstraction (i.e. polymorphic functions)

Tags:

I'm still in the early phase of learning OCaml and am curious to know what the best way to extract maximum performance out of generic code in OCaml is.

As a small experiment, I've written two polymorphic functions: one in C++ and the other in OCaml that find the largest element in a given array.

What I've observed is that while in C++ you pay no penalty for this sort of abstraction, the penalty in OCaml is a whopping one degree of magnitude drop in performance. And as an aside, the C++ solution I quickly concocted is more generic that the OCaml one, but I blame that primarily on my inexperience with the language.

My question is the following: how to write and use polymorphic functions in OCaml without paying the huge performance penalty that I've just observed?

Another thing I observed for this particular problem is that my functional solution in OCaml was slower than the imperative one, whereas the "functional" solution in C++ suffered no penalties compared to the analogue imperative approach.

C++ code was compiled with g++ -std="c++0x" -O3 -o max_cpp max.cpp, using gcc-4.6.3 and OCaml code with: ocamlopt -o max_ml max.ml using ocamlopt version 3.12.1. Both of the generated executables were 32 bit and run on Ubuntu x64 11.04

I submit the code of both programs.

C++ code (not entirely safe, of course ;) )

#include <iostream>
#include <vector>
#include <numeric>
template <typename T> T max (T a, T b) {
     return (a>b)? a : b;
}

template <typename I> typename I::value_type find_max (I begin, I end) {
    auto max = *begin;
    for (begin++;begin!=end;begin++)
            if (*begin > max) max = *begin;
    return max;
}

template <typename I> typename I::value_type find_max1(I begin, I end) {
    return std::accumulate(begin, end, *begin, max< typename I::value_type> );
}

int main(int argc, char ** argv) {
    const size_t nElem = atoi(argv[1]);
    const size_t nIter = atoi(argv[2]);
    std::vector<int> intA(nElem);
    std::vector<double> dubA(nElem);
    for (int i=0;i<nIter;i++) {
            auto res1 = find_max(intA.begin(), intA.end());
            auto res2 = find_max(dubA.begin(), dubA.end());
            std::cout << "Max int: " << res1 << " " << "Max dub: " << res2 << std::endl;
    }
}

OCaml code

let max a b = if a > b then a else b

(* functional version *)
let find_max vector =
    Array.fold_right max vector vector.(0)

(* imperative version *)
let find_max1 vector =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

let nElems = int_of_string Sys.argv.(1)

let nIter  = int_of_string Sys.argv.(2)

let _ = let intA = Array.make nElems 0 in
    let dubA = Array.make nElems 0.0 in
    for i=1 to nIter do
            let res1 = find_max intA in
            let res2 = find_max dubA in
            print_int !res1;
            print_newline ();
            print_float !res2;
    done

However, if I "overload" the function to discriminate ints and floats then the performance of the program even exceeds that of my C++ code by 50%! I wonder why though.

let find_max_int (vector : int array) =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

let find_max_float (vector : float array) =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

The timings were performed rather crudely with: time ./max_cpp 1000000 100 and time ./max_ml 1000000 100

314

asked Dec 19 '12 16:12

PetarMarendic

2 Answers

In OCaml, the comparison operator (<) is a generic function of type 'a -> 'a -> bool (likewise for equality, etc.). This means that it is implemented by a special primitive in the runtime system that effectively run ad-hoc comparison on data structure of any type. The type-checker is able to optimize monomorphic comparisons on integers and floats into the specialized comparison operation, instead of doing the type check at runtime in the polymorphic case. A tenfold difference in efficiency is not surprising if you test only this operation.

Note that to get maximal flexibility you should abstract on the comparison operation in find_max. This would however prevent the monomorphization optimization, so an inlined version will still perform better.

I think your approach (doing micro-benchmarks and hoping to learn interesting things about the performance of real programs) is flawed. You ran into a highly specific case of OCaml performance behavior and wrongly concluded from there that the performance of polymorphic functions "is poor". This is clearly a bad conclusion out of premature optimization. Write an actually representative example of the computations you intend to run, and then reason on the performance of this real-world piece of code. You will learn very little truth from micro-benchmarks of this sort, and a lot of irrelevant details.

(It is true however that C++ code specialization approach can produce more efficient code in general than the OCaml compilation technique, that compiles only one version of the function for all types but has to make corresponding data representation compromises; in OCaml there can be an overhead related to polymorphism. However, that is observable only in fairly specific cases and you can often just make a specific inlining pass in the small critical section of your code. What you gain in return is much faster compilation (no duplication) and actually readable error messages -- like you might have if concepts were integrated in C++.)

Edit: I was in fact wrong in saying that abstracting on the comparison would prevent the type-directed optimization. The following code, while still not as fast as the inlined-by-hand version, is still noticeably faster than the version using polymorphic comparison:

let find_max1 comp vector =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
      if comp !max vector.(i) then max := vector.(i)
    done;
    !max

let find_max_int (vector : int array) = find_max1 (fun x y -> x < y) vector
let find_max_float (vector : float array) = find_max1 (fun x y -> x < y) vector

199

answered Nov 04 '22 05:11

gasche

In fact you are comparing compile-time specialized functions vs runtime-dispatched. More adequate code on ocaml side would be using functors, that theoretically could reduce the number of indirect calls to zero, but I guess it will still suffer from underoptimization. Another problem is that data representation is uniform and not specialized/inlined for user type in any case.

answered Nov 04 '22 04:11

ygrek

Related questions
                            
                                Strange behavior when static casting from a big double to an integer
                            
                                C++ copy a stream object
                            
                                Returning abstract type in base class
                            
                                "template" keyword not needed? [gcc/clang/Comeau bug?]
                            
                                Where is cout declared?
                            
                                Correct use of string storage in C and C++
                            
                                G++ Compiler Error or faulty code? : "template definition of non-template"
                            
                                Looking for a low impact c++ profiler
                            
                                How to use functions from different C++ projects in Visual Studio 2010?
                            
                                boost::format and custom printing a std containers
                            
                                Can string-based user-defined literals be strongly typed?
                            
                                How to scale to resolution in SDL?
                            
                                Why are redundant class name qualifiers allowed?
                            
                                Efficiently multiply (n-1) elements of an array [duplicate]
                            
                                g++: std::function initialized with closure type always uses heap allocation?
                            
                                C++ sizeof with bool
                            
                                Why doesn't shared_ptr<A> implicit convert to shared_ptr<A const>?
                            
                                Are there any preprocessor directives that control loop unrolling?
                            
                                I am unable to run a C++ program in Debian(Ubuntu) that works in Redhat(Centos)
                            
                                What are defaulted methods and how do I use them properly? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In OCaml, how big is the price of abstraction (i.e. polymorphic functions)

Tags:

c++

polymorphism

ocaml

PetarMarendic

People also ask

2 Answers

gasche

ygrek

Recent Activity

Donate For Us