Is it possible to use Haskell to reasonably solve large DP problems

Question

I wrote code for solving the local alignment problem with Smith–Waterman algorithm.

I want to do this with input of strings with length 10000, with reasonable memory(under 2GB ram) and reasonable time (under 5 minutes).

At first I was using bio library's built in function for this, and it runs way too slow and eat up 4GB of ram before I killed it.

Note the java program jAligner, which implements the same algorithm, can solve this problem with less than 1GB of memory and less than 20 seconds.

When I wrote an unboxed version of this, the program gives me <<loop>>. I think it's because the array need to access items in the array before the array gets built entirely.

So I wonder is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.

module LocalAlign where
--import Data.Array.Unboxed
import Data.Tuple
import Data.Array


localAffineAlignment :: (Char -> Char -> Int)
                                    -> Int
                                    -> Int
                                    -> String
                                    -> String
                                    -> (Int, (String, String, String, String))
localAffineAlignment f g e s' t' = (score, best) where
    n = length s'
    m = length t'
    s= array (0,n-1) $ zip [0..n-1] s'
    t= array (0,m-1) $ zip [0..m-1] t'
    table :: (Array (Int,Int) Int,Array (Int,Int) Int)
    table   = (c,d)
      where --a :: UArray (Int,Int) Int
          a = array ((0,0),(n,m)) [((x,y),a' x y)|x<-[0..n],y<-[0..m]] --s end with gap
          b = array ((0,0),(n,m)) [((x,y),b' x y)|x<-[0..n],y<-[0..m]] --t end with gap
          c = array ((0,0),(n,m)) [((x,y),fst (c' x y))|x<-[0..n],y<-[0..m]] -- best 
          d = array ((0,0),(n,m)) [((x,y),snd (c' x y))|x<-[0..n],y<-[0..m]] -- direction
          a' i j
            | i==0 || j==0  = inf
            | otherwise     = max (a!(i-1,j)-e) (c!(i-1,j)-g-e)
          b' i j
            | i==0 || j==0  = inf
            | otherwise     = max (b!(i,j-1)-e) (c!(i,j-1)-g-e)
          c' i j
            | min i j == 0  = (0,0)
            | otherwise     = maximum [(b!(i,j),3),(a!(i,j),2),(c!(i-1,j-1) + f u v,1),(0,0)]
            where u = s!(i-1)
                  v = t!(j-1)
          inf = -1073741824
    score :: Int
    score = maximum $ elems $ fst table
    best :: (String, String, String, String)
    best = (drop si $ take ei s',drop sj $ take ej t',b1,b2)
      where (a,d') = table
            (si,sj,b1,b2) = build ei ej [] []
            (ei,ej) = snd $ maximum $ map swap $ assocs a
            build x y ss tt
             | o == 0       = (x,y,ss,tt)
             | d == 1       = build (x-1) (y-1) (u:ss) (v:tt) 
             | d == 2       = build (x-1) y     (u:ss) ('-':tt) 
             | otherwise    = build x (y-1)     ('-':ss) (v:tt) 
             where o = a!(x,y)
                   d = d'!(x,y)
                   u = s!(x-1)
                   v = t!(y-1)

Don Stewart · Accepted Answer

is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.

Yes, of course. Use the same data structures and the same algorithms, and you will get same (or better, or worse, by constant factors) performance.

You are using (intermediate) lists and boxed arrays heavily. Consider using the vector package instead.

Mike Izbicki · Answer

You might be interested in the MemoCombinators library, which makes doing dynamic programming much easier. You can basically write the algorithm without memoization, then just annotate which variables you want memoized, and the compiler takes it from there.

Is it possible to use Haskell to reasonably solve large DP problems

Tags:

haskell

Chao Xu

2 Answers

Don Stewart

Mike Izbicki

Recent Activity

Donate For Us

Is it possible to use Haskell to reasonably solve large DP problems

Tags:

haskell

Chao Xu

2 Answers

Don Stewart

Mike Izbicki

Related questions

Recent Activity

Donate For Us