Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is factorial calculation much faster in Haskell than in Java

One of the programming problems I have come across involves calculation of factorials of large numbers (of numbers up to 10^5). I have seen a simple Haskell code which goes like this

factorial :: (Eq x, Num x) => x -> x
factorial 0 = 1
factorial a = a * factorial (a - 1)

which implicitly handles the huge numbers and somehow runs faster even without any caching that is involved in the code.

When I tried to solve the problem using Java, I had to use BigInteger to hold the huge numbers and also use iterative version of factorial

public static BigInteger factorialIterative(int n)
{
        if(n == 0 || n == 1) return BigInteger.valueOf(1);
        BigInteger f = BigInteger.valueOf(1);
        for(int i = 1 ; i <= n ;i++)
            f = f.multiply(BigInteger.valueOf(i));
        return f;
}

The above code exceeded the set time limit of execution for the program . I also tried the cached recursive version of factorial

public static BigInteger factorial(int n)
{
     if(cache[n] != null) 
         return cache[n];
     else if(n == 0) 
         return new BigInteger("1");
     else {
         cache[n] = n* factorial(n - 1);
         return cache[n]; 
     }
}          

which gave me an out of memory error (probably due to recursion).

My Question is , why are functional programming languages like Haskell better in handling these sorts of problems involving huge numbers? (despite no obvious caching). Is there a way I can make the java code run as fast as the Haskell code?

like image 251
quirkystack Avatar asked Jul 11 '13 03:07

quirkystack


5 Answers

The difference is, as shachaf said, that GHC (by default) uses GMP for Integer computations that exceed the Int range, and GMP is rather well optimised. It has nothing to do with purity, caching, tail-call optimisation or such.

Java's BigInteger uses more or less the naive schoolbook algorithms. If you look at the code for multiply (openjdk7), the work horse is

/**
 * Multiplies int arrays x and y to the specified lengths and places
 * the result into z. There will be no leading zeros in the resultant array.
 */
private int[] multiplyToLen(int[] x, int xlen, int[] y, int ylen, int[] z) {
    int xstart = xlen - 1;
    int ystart = ylen - 1;

    if (z == null || z.length < (xlen+ ylen))
        z = new int[xlen+ylen];

    long carry = 0;
    for (int j=ystart, k=ystart+1+xstart; j>=0; j--, k--) {
        long product = (y[j] & LONG_MASK) *
                       (x[xstart] & LONG_MASK) + carry;
        z[k] = (int)product;
        carry = product >>> 32;
    }
    z[xstart] = (int)carry;

    for (int i = xstart-1; i >= 0; i--) {
        carry = 0;
        for (int j=ystart, k=ystart+1+i; j>=0; j--, k--) {
            long product = (y[j] & LONG_MASK) *
                           (x[i] & LONG_MASK) +
                           (z[k] & LONG_MASK) + carry;
            z[k] = (int)product;
            carry = product >>> 32;
        }
        z[i] = (int)carry;
    }
    return z;
}

a quadratic digit-by-digit multiplication (digits are of course not base 10). That doesn't hurt too much here, since one of the factors is always single-digit, but indicates that not too much work has yet been put into optimising BigInteger computations in Java.

One thing that can be seen from the source is that in Java products of the form smallNumber * largeNumber are faster than largeNumber * smallNumber (in particular if the small number is single-digit, having that as the first number means the second loop with the nested loop doesn't run at all, so you have altogether less loop control overhead, and the loop that is run has a simpler body).

So changing

f = f.multiply(BigInteger.valueOf(i));

in your Java version to

f = BigInteger.valueOf(i).multiply(f);

gives a considerable speedup (increasing with the argument, ~2× for 25000, ~2.5× for 50000, ~2.8× for 100000).

The computation is still much slower than the GHC/GMP combination by a factor of roughly 4 in the tested range on my box, but, well, GMP's implementation is plain better optimised.

If you make computations that often multiply two large numbers, the algorithmic difference between the quadratic BigInteger multiplication and GMP's that uses Karatsuba or Toom-Cook when the factors are large enough (FFT for really large numbers) would show.

However, if multiplying is not all that you do, if you print out the factorials, hence convert them to a String, you get hit by the fact that BigInteger's toString method is abominably slow (it's roughly quadratic, so since the computation of the factorial is altogether quadratic in the length of the result, you get no [much] higher algorithmic complexity, but you get a big constant factor on top of the computation). The Show instance for Integer is much better, O(n * (log n)^x) [not sure what x is, between 1 and 2], so converting the result to String adds just a little to the computation time.

like image 59
Daniel Fischer Avatar answered Nov 20 '22 03:11

Daniel Fischer


I first want to point out two factors which are clearly not the reason for the speed difference, but have been mentioned nevertheless, in the question and some answers.

No caching / memoization

The question mentions caching, and some of the answers mention memoization. But the factorial function does not benefit from memoization, because it recursively calls itself with different arguments. So we would never hit an entry in the cache that's already filled, and the whole caching is unnecessary. Maybe people were thinking of the fibonacci function here?

For the record, Haskell would not provide automatic memoization anyway.

No other clever optimization

Both the Java and the Haskell program look already pretty optimal to me. Both programs use the iteration mechanism of choice of their respective languages: Java uses a loop, Haskell uses recursion. Both programs use a standard type for big integer arithmetic.

If anything, the Haskell version should be slower because it is not tail recursive, whereas the Java version uses a loop which is the fastest looping construct available in Java.

I don't see much scope for clever high-level optimizations a compiler could make to these programs. I suspect that the observed speed difference is due to low-level details about how big integers are implemented.

So why is the Haskell version faster?

The Haskell compiler has built-in and reasonable support for Integer. That seems to be less so with Java implementations and the big integer class. I googled for "BigInteger slow" and the results suggest that the question really should be: Why is Java's BigInteger so slow? There seem to be other big integer classes that are faster. I'm not a Java expert, so I cannot answer this variant of the question in any detail.

like image 27
Toxaris Avatar answered Nov 20 '22 03:11

Toxaris


Here's a related question: https://softwareengineering.stackexchange.com/q/149167/26988

It seems that in this particular case, you're seeing the difference in optimizations of a pure vs impure function. In Haskell, all functions are pure unless they're doing IO (see link).

I think what's going on is that GHC is able to optimize the code better because of the guarantee of purity. Even though there isn't a tail call, it knows there aren't any side effects (because of purity guarantee), so it can do some optimizations the Java code can't (like automatic caching and whatnot as @andrew mentions in his answer).

A better solution in Haskell would be to use the built-in product function:

factorial n = product [1..n]

This is able to do tail-call optimization because it's just iteration. The same can be done in Java with a for loop as in your example, but it won't have the benefit of being functionally pure.

Edit:

I assumed tail-call elimination was happening, but it apparently is not. Here's the original answer for reference (it still has useful information about why Haskell may be faster that Java in some recursive contexts).

Functional programming languages like Haskell take advantange of tail call elimination.

In most programming languages, recursive calls maintain a call stack. Each recursive function allocates a new stack, which isn't cleaned up until it returns. For example:

call fact()
    call fact()
        call fact()
        cleanup
    cleanup
cleanup

Functional languages, however, don't need to maintain a stack. In procedural languages, it's often difficult to tell if the return value will be used by the caling function, so it's hard to optimize. In FP, however, the return value only makes sense when the recursion is complete, so you can eliminate the call stack and end up with something like this:

call fact()
call fact()
call fact()
cleanup

The call fact() lines can all happen in the same stack frame because the return value isn't needed in intermediate calculations.

Now, to answer your question, you can solve this problem in a variety of ways, all of which aim to eliminate the call stack:

  • use a for loop instead of recursion (usually the best option)
  • return void and hope the compiler does tail call elimination
  • use a trampoline function (similar to the for-loop idea, but it looks more like recursion)

Here are some related questions with examples for the above:

  • Recursion vs For loops - Factorials, Java (for loop instead of recursion for factorial)
  • Why does the JVM still not support tail-call optimization? (maybe tail call elimination isn't reliable in Java)
  • What is a trampoline function? (trampoline function in C)

Note:

It's not guaranteed that recursive calls will reuse the same stack frame, so some implementations may reallocate on each recursive call. This is often easier and still provides the same memory safety as reusing the stack frame.

For more information about this, see these articles:

  • Tail Call
  • Tail Recursion in Haskell
like image 29
beatgammit Avatar answered Nov 20 '22 02:11

beatgammit


I think the difference has nothing to do with tail-call optimization, or optimization at all. The reason I think that is that the optimization can, at it's best, only achieve something that is like your iterative Java version.

The real reason is, IMHO, that Java BigIntegers are slow compared with Haskell's.

To establish this, I propose 2 experiments:

  1. Use the same algorithms, but use long. (The results will be some garbage for higher numbers, but the computations will be done nevertheless.) Here, the Java version should be on par with Haskell.

  2. Use a faster big integer library in the java version. The performance should improve accordingly. There are wrappers for GMP out there, as well as improvements to the java big integers like here. The manyfold performance imporvements possible for multiplication of large numbers are telling.

like image 25
Ingo Avatar answered Nov 20 '22 03:11

Ingo


The below explanations are obviously not enough. Here's some slides that explain the transformation a function goes through when it's parameters are strict (like the above example) and no thunks are generated: http://www.slideshare.net/ilyasergey/static-analyses-and-code-optimizations-in-glasgow-haskell-compiler

The Haskell version will be doing just the calculation storing just the previous calculation and the applying the next one, eg 6 x 4. Whereas the Java version is doing caching (all the historic values), memory management, GC and the like.

It's doing strictness analysis and it's automatically caching the previous computation. See: http://neilmitchell.blogspot.com.au/2008/03/lazy-evaluation-strict-vs-speculative.html?m=1

More details are on the Haskell Wiki: "Optimising compilers like GHC try to reduce the cost of laziness using strictness analysis, which attempts to determine which function arguments are always evaluated by the function, and hence can be evaluated by the caller instead."

"Strictness analysis can spot the fact that the argument n is strict, and can be represented unboxed. The resulting function won't use any heap while it is running, as you'd expect."

"Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later."

http://www.haskell.org/haskellwiki/Performance/Strictness http://www.haskell.org/haskellwiki/GHC_optimisations

like image 38
andrew Avatar answered Nov 20 '22 04:11

andrew