How does one write efficient Dynamic Programming algorithms in Haskell?

Question

I've been playing around with dynamic programming in Haskell. Practically every tutorial I've seen on the subject gives the same, very elegant algorithm based on memoization and the laziness of the Array type. Inspired by those examples, I wrote the following algorithm as a test:

-- pascal n returns the nth entry on the main diagonal of pascal's triangle
-- (mod a million for efficiency)
pascal :: Int -> Int
pascal n  = p ! (n,n) where
           p = listArray ((0,0),(n,n)) [f (i,j) | i <- [0 .. n], j <- [0 .. n]]

           f :: (Int,Int) -> Int
           f (_,0) = 1
           f (0,_) = 1
           f (i,j) = (p ! (i, j-1) + p ! (i-1, j)) `mod` 1000000

My only problem is efficiency. Even using GHC's -O2, this program takes 1.6 seconds to compute pascal 1000, which is about 160 times slower than an equivalent unoptimized C++ program. And the gap only widens with larger inputs.

It seems like I've tried every possible permutation of the above code, along with suggested alternatives like the data-memocombinators library, and they all had the same or worse performance. The one thing I haven't tried is the ST Monad, which I'm sure could be made to run the program only slighter slower than the C version. But I'd really like to write it in idiomatic Haskell, and I don't understand why the idiomatic version is so inefficient. I have two questions:

Why is the above code so inefficient? It seems like a straightforward iteration through a matrix, with an arithmetic operation at each entry. Clearly Haskell is doing something behind the scenes I don't understand.
Is there a way to make it much more efficient (at most 10-15 times the runtime of a C program) without sacrificing its stateless, recursive formulation (vis-a-vis an implementation using mutable arrays in the ST Monad)?

Thanks a lot.

Edit: The array module used is the standard Data.Array

Louis Wasserman · Accepted Answer

Well, the algorithm could be designed a little better. Using the vector package and being smart about only keeping one row in memory at a time, we can get something that's idiomatic in a different way:

{-# LANGUAGE BangPatterns #-}
import Data.Vector.Unboxed
import Prelude hiding (replicate, tail, scanl)

pascal :: Int -> Int
pascal !n = go 1 ((replicate (n+1) 1) :: Vector Int) where
  go !i !prevRow
    | i <= n    = go (i+1) (scanl f 1 (tail prevRow))
    | otherwise = prevRow ! n
  f x y = (x + y) `rem` 1000000

This optimizes down very tightly, especially because the vector package includes some rather ingenious tricks to transparently optimize array operations written in an idiomatic style.

Daniel Fischer · Answer

1 Why is the above code so inefficient? It seems like a straightforward iteration through a matrix, with an arithmetic operation at each entry. Clearly Haskell is doing something behind the scenes I don't understand.

The problem is that the code writes thunks to the array. Then when entry (n,n) is read, the evaluation of the thunks jumps all over the array again, recurring until finally a value not needing further recursion is found. That causes a lot of unnecessary allocation and inefficiency.

The C++ code doesn't have that problem, the values are written, and read directly without requiring further evaluation. As it would happen with an STUArray. Does

p = runSTUArray $ do
    arr <- newArray ((0,0),(n,n)) 1
    forM_ [1 .. n] $ \i ->
        forM_ [1 .. n] $ \j -> do
            a <- readArray arr (i,j-1)
            b <- readArray arr (i-1,j)
            writeArray arr (i,j) $! (a+b) `rem` 1000000
    return arr

really look so bad?

2 Is there a way to make it much more efficient (at most 10-15 times the runtime of a C program) without sacrificing its stateless, recursive formulation (vis-a-vis an implementation using mutable arrays in the ST Monad)?

I don't know of one. But there might be.

Addendum:

Once one uses STUArrays or unboxed Vectors, there's still a significant difference to the equivalent C implementation. The reason is that gcc replaces the % by a combination of multiplications, shifts and subtractions (even without optimisations), since the modulus is known. Doing the same by hand in Haskell (since GHC doesn't [yet] do that),

-- fast modulo 1000000
-- for nonnegative Ints < 2^31
-- requires 64-bit Ints
fastMod :: Int -> Int
fastMod n = n - 1000000*((n*1125899907) `shiftR` 50)

gets the Haskell versions on par with C.

Daniel Wagner · Answer

The trick is to think about how to write the whole damn algorithm at once, and then use unboxed vectors as your backing data type. For example, the following runs about 20 times faster on my machine than your code:

import qualified Data.Vector.Unboxed as V

combine :: Int -> Int -> Int
combine x y = (x+y) `mod` 1000000

pascal n = V.last $ go n where
    go 0 = V.replicate (n+1) 1
    go m = V.scanl1 combine (go (m-1))

I then wrote two main functions that called out to yours and mine with an argument of 4000; these ran in 10.42s and 0.54s respectively. Of course, as I'm sure you know, they both get blown out of the water (0.00s) by the version that uses a better algorithm:

pascal' :: Integer -> Integer
pascal :: Int -> Int
pascal' n = product [n+1..n*2] `div` product [2..n]
pascal = fromIntegral . (`mod` 1000000) . pascal' . fromIntegral

How does one write efficient Dynamic Programming algorithms in Haskell?

Tags:

performance

algorithm

functional-programming

haskell

Ivan Vendrov

3 Answers

Louis Wasserman

Daniel Fischer

Daniel Wagner

Recent Activity

Donate For Us

How does one write efficient Dynamic Programming algorithms in Haskell?

Tags:

performance

algorithm

functional-programming

haskell

Ivan Vendrov

3 Answers

Louis Wasserman

Daniel Fischer

Daniel Wagner

Related questions

Recent Activity

Donate For Us