Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one write efficient Dynamic Programming algorithms in Haskell?

I've been playing around with dynamic programming in Haskell. Practically every tutorial I've seen on the subject gives the same, very elegant algorithm based on memoization and the laziness of the Array type. Inspired by those examples, I wrote the following algorithm as a test:

-- pascal n returns the nth entry on the main diagonal of pascal's triangle
-- (mod a million for efficiency)
pascal :: Int -> Int
pascal n  = p ! (n,n) where
           p = listArray ((0,0),(n,n)) [f (i,j) | i <- [0 .. n], j <- [0 .. n]]

           f :: (Int,Int) -> Int
           f (_,0) = 1
           f (0,_) = 1
           f (i,j) = (p ! (i, j-1) + p ! (i-1, j)) `mod` 1000000 

My only problem is efficiency. Even using GHC's -O2, this program takes 1.6 seconds to compute pascal 1000, which is about 160 times slower than an equivalent unoptimized C++ program. And the gap only widens with larger inputs.

It seems like I've tried every possible permutation of the above code, along with suggested alternatives like the data-memocombinators library, and they all had the same or worse performance. The one thing I haven't tried is the ST Monad, which I'm sure could be made to run the program only slighter slower than the C version. But I'd really like to write it in idiomatic Haskell, and I don't understand why the idiomatic version is so inefficient. I have two questions:

  1. Why is the above code so inefficient? It seems like a straightforward iteration through a matrix, with an arithmetic operation at each entry. Clearly Haskell is doing something behind the scenes I don't understand.

  2. Is there a way to make it much more efficient (at most 10-15 times the runtime of a C program) without sacrificing its stateless, recursive formulation (vis-a-vis an implementation using mutable arrays in the ST Monad)?

Thanks a lot.

Edit: The array module used is the standard Data.Array

like image 727
Ivan Vendrov Avatar asked Jun 05 '12 22:06

Ivan Vendrov


3 Answers

Well, the algorithm could be designed a little better. Using the vector package and being smart about only keeping one row in memory at a time, we can get something that's idiomatic in a different way:

{-# LANGUAGE BangPatterns #-}
import Data.Vector.Unboxed
import Prelude hiding (replicate, tail, scanl)

pascal :: Int -> Int
pascal !n = go 1 ((replicate (n+1) 1) :: Vector Int) where
  go !i !prevRow
    | i <= n    = go (i+1) (scanl f 1 (tail prevRow))
    | otherwise = prevRow ! n
  f x y = (x + y) `rem` 1000000

This optimizes down very tightly, especially because the vector package includes some rather ingenious tricks to transparently optimize array operations written in an idiomatic style.

like image 186
Louis Wasserman Avatar answered Nov 08 '22 20:11

Louis Wasserman


1 Why is the above code so inefficient? It seems like a straightforward iteration through a matrix, with an arithmetic operation at each entry. Clearly Haskell is doing something behind the scenes I don't understand.

The problem is that the code writes thunks to the array. Then when entry (n,n) is read, the evaluation of the thunks jumps all over the array again, recurring until finally a value not needing further recursion is found. That causes a lot of unnecessary allocation and inefficiency.

The C++ code doesn't have that problem, the values are written, and read directly without requiring further evaluation. As it would happen with an STUArray. Does

p = runSTUArray $ do
    arr <- newArray ((0,0),(n,n)) 1
    forM_ [1 .. n] $ \i ->
        forM_ [1 .. n] $ \j -> do
            a <- readArray arr (i,j-1)
            b <- readArray arr (i-1,j)
            writeArray arr (i,j) $! (a+b) `rem` 1000000
    return arr

really look so bad?

2 Is there a way to make it much more efficient (at most 10-15 times the runtime of a C program) without sacrificing its stateless, recursive formulation (vis-a-vis an implementation using mutable arrays in the ST Monad)?

I don't know of one. But there might be.

Addendum:

Once one uses STUArrays or unboxed Vectors, there's still a significant difference to the equivalent C implementation. The reason is that gcc replaces the % by a combination of multiplications, shifts and subtractions (even without optimisations), since the modulus is known. Doing the same by hand in Haskell (since GHC doesn't [yet] do that),

-- fast modulo 1000000
-- for nonnegative Ints < 2^31
-- requires 64-bit Ints
fastMod :: Int -> Int
fastMod n = n - 1000000*((n*1125899907) `shiftR` 50)

gets the Haskell versions on par with C.

like image 36
Daniel Fischer Avatar answered Nov 08 '22 18:11

Daniel Fischer


The trick is to think about how to write the whole damn algorithm at once, and then use unboxed vectors as your backing data type. For example, the following runs about 20 times faster on my machine than your code:

import qualified Data.Vector.Unboxed as V

combine :: Int -> Int -> Int
combine x y = (x+y) `mod` 1000000

pascal n = V.last $ go n where
    go 0 = V.replicate (n+1) 1
    go m = V.scanl1 combine (go (m-1))

I then wrote two main functions that called out to yours and mine with an argument of 4000; these ran in 10.42s and 0.54s respectively. Of course, as I'm sure you know, they both get blown out of the water (0.00s) by the version that uses a better algorithm:

pascal' :: Integer -> Integer
pascal :: Int -> Int
pascal' n = product [n+1..n*2] `div` product [2..n]
pascal = fromIntegral . (`mod` 1000000) . pascal' . fromIntegral
like image 9
Daniel Wagner Avatar answered Nov 08 '22 18:11

Daniel Wagner