Ackermann very inefficient with Haskell/GHC

Question

I try computing Ackermann(4,1) and there's a big difference in performance between different languages/compilers. Below are results on my Core i7 3820QM, 16G, Ubuntu 12.10 64bit,

C: 1.6s, gcc -O3 (with gcc 4.7.2)

int ack(int m, int n) {   if (m == 0) return n+1;   if (n == 0) return ack(m-1, 1);   return ack(m-1, ack(m, n-1)); }  int main() {   printf("%d
", ack(4,1));   return 0; }

OCaml: 3.6s, ocamlopt (with ocaml 3.12.1)

let rec ack = function   | 0,n -> n+1   | m,0 -> ack (m-1, 1)   | m,n -> ack (m-1, ack (m, n-1)) in print_int (ack (4, 1))

Standard ML: 5.1s mlton -codegen c -cc-opt -O3 (with mlton 20100608)

fun ack 0 n = n+1   | ack m 0 = ack (m-1) 1   | ack m n = ack (m-1) (ack m (n-1)); print (Int.toString (ack 4 1));

Racket: 11.5s racket (with racket v5.3.3)

(require racket/unsafe/ops)  (define + unsafe-fx+) (define - unsafe-fx-) (define (ack m n)   (cond     [(zero? m) (+ n 1)]     [(zero? n) (ack (- m 1) 1)]     [else (ack (- m 1) (ack m (- n 1)))]))  (time (ack 4 1))

~~Haskell: unfinished, killed by system after 22s ghc -O2 (with ghc 7.4.2)~~

Haskell: 1.8s ajhc (with ajhc 0.8.0.4)

main = print $ ack 4 1   where ack :: Int -> Int -> Int         ack 0 n = n+1         ack m 0 = ack (m-1) 1         ack m n = ack (m-1) (ack m (n-1))

The Haskell version is the only one that fails to terminate properly because it takes too much memory. It freezes my machine and fills the swap space before getting killed. What can I do to improve it without heavily fuglifying the code?

EDIT: I appreciate some of the asymptotically smarter solutions, but they are not exactly what I am asking for. This is more about seeing whether the compiler handles certain patterns in a reasonably efficient way (stack, tail calls, unboxing, etc.) than computing the ackermann function.

EDIT 2: As pointed out by several responses, this seems to be a bug in recent versions of GHC. I try the same code with AJHC and get much better performance.

Thank you very much :)

Mikhail Glushenkov · Accepted Answer

NB: The high memory usage issue is a bug in the GHC RTS, where upon stack overflow and allocation of new stacks on the heap it was not checked whether garbage collection is due. It has been already fixed in GHC HEAD.

I was able to get much better performance by CPS-converting ack:

module Main where  data P = P !Int !Int  main :: IO () main = print $ ack (P 4 1) id   where     ack :: P -> (Int -> Int) -> Int     ack (P 0 n) k = k (n + 1)     ack (P m 0) k = ack (P (m-1) 1) k     ack (P m n) k = ack (P m (n-1)) (\a -> ack (P (m-1) a) k)

Your original function consumes all available memory on my machine, while this one runs in constant space.

$ time ./Test 65533 ./Test  52,47s user 0,50s system 96% cpu 54,797 total

Ocaml is still faster, however:

$ time ./test 65533./test  7,97s user 0,05s system 94% cpu 8,475 total

Edit: When compiled with JHC, your original program is about as fast as the Ocaml version:

$ time ./hs.out  65533 ./hs.out  5,31s user 0,03s system 96% cpu 5,515 total

Edit 2: Something else I've discovered: running your original program with a larger stack chunk size (+RTS -kc1M) makes it run in constant space. The CPS version is still a bit faster, though.

Edit 3: I managed to produce a version that runs nearly as fast as the Ocaml one by manually unrolling the main loop. However, it only works when run with +RTS -kc1M (Dan Doel has filed a bug about this behaviour):

{-# LANGUAGE CPP #-} module Main where  data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int  ack0 :: Int -> Int ack0 n =(n+1)  #define C(a) a #define CONCAT(a,b) C(a)C(b)  #define AckType(M) CONCAT(ack,M) :: Int -> Int  AckType(1) AckType(2) AckType(3) AckType(4)  #define AckDecl(M,M1) \ CONCAT(ack,M) n = case n of { 0 -> CONCAT(ack,M1) 1 \ ; 1 ->  CONCAT(ack,M1) (CONCAT(ack,M1) 1) \ ; _ ->  CONCAT(ack,M1) (CONCAT(ack,M) (n-1)) }  AckDecl(1,0) AckDecl(2,1) AckDecl(3,2) AckDecl(4,3)  ack :: P -> (Int -> Int) -> Int ack (P m n) k = case m of   0 -> k (ack0 n)   1 -> k (ack1 n)   2 -> k (ack2 n)   3 -> k (ack3 n)   4 -> k (ack4 n)   _ -> case n of     0 -> ack (P (m-1) 1) k     1 -> ack (P (m-1) 1) (\a -> ack (P (m-1) a) k)     _ -> ack (P m (n-1)) (\a -> ack (P (m-1) a) k)  main :: IO () main = print $ ack (P 4 1) id

Testing:

$ time ./Test +RTS -kc1M 65533 ./Test +RTS -kc1M  6,30s user 0,04s system 97% cpu 6,516 total

Edit 4: Apparently, the space leak is fixed in GHC HEAD, so +RTS -kc1M won't be required in the future.

Petr · Answer

It seems that there is some kind of bug involved. What GHC version are you using?

With GHC 7, I get the same behavior as you do. The program consumes all available memory without producing any output.

However if I compile it with GHC 6.12.1 just with ghc --make -O2 Ack.hs, it works perfectly. It computes the result in 10.8s on my computer, while plain C version takes 7.8s.

I suggest you to report this bug on GHC web site.

Ackermann very inefficient with Haskell/GHC

Tags:

performance

optimization

benchmarking

recursion

haskell

Phil

2 Answers

Mikhail Glushenkov

Petr

Recent Activity

Donate For Us

Ackermann very inefficient with Haskell/GHC

Tags:

performance

optimization

benchmarking

recursion

haskell

Phil

2 Answers

Mikhail Glushenkov

Petr

Related questions

Recent Activity

Donate For Us