Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to use Haskell to reasonably solve large DP problems

Tags:

haskell

I wrote code for solving the local alignment problem with Smith–Waterman algorithm.

I want to do this with input of strings with length 10000, with reasonable memory(under 2GB ram) and reasonable time (under 5 minutes).

At first I was using bio library's built in function for this, and it runs way too slow and eat up 4GB of ram before I killed it.

Note the java program jAligner, which implements the same algorithm, can solve this problem with less than 1GB of memory and less than 20 seconds.

When I wrote an unboxed version of this, the program gives me <<loop>>. I think it's because the array need to access items in the array before the array gets built entirely.

So I wonder is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.

module LocalAlign where
--import Data.Array.Unboxed
import Data.Tuple
import Data.Array


localAffineAlignment :: (Char -> Char -> Int)
                                    -> Int
                                    -> Int
                                    -> String
                                    -> String
                                    -> (Int, (String, String, String, String))
localAffineAlignment f g e s' t' = (score, best) where
    n = length s'
    m = length t'
    s= array (0,n-1) $ zip [0..n-1] s'
    t= array (0,m-1) $ zip [0..m-1] t'
    table :: (Array (Int,Int) Int,Array (Int,Int) Int)
    table   = (c,d)
      where --a :: UArray (Int,Int) Int
          a = array ((0,0),(n,m)) [((x,y),a' x y)|x<-[0..n],y<-[0..m]] --s end with gap
          b = array ((0,0),(n,m)) [((x,y),b' x y)|x<-[0..n],y<-[0..m]] --t end with gap
          c = array ((0,0),(n,m)) [((x,y),fst (c' x y))|x<-[0..n],y<-[0..m]] -- best 
          d = array ((0,0),(n,m)) [((x,y),snd (c' x y))|x<-[0..n],y<-[0..m]] -- direction
          a' i j
            | i==0 || j==0  = inf
            | otherwise     = max (a!(i-1,j)-e) (c!(i-1,j)-g-e)
          b' i j
            | i==0 || j==0  = inf
            | otherwise     = max (b!(i,j-1)-e) (c!(i,j-1)-g-e)
          c' i j
            | min i j == 0  = (0,0)
            | otherwise     = maximum [(b!(i,j),3),(a!(i,j),2),(c!(i-1,j-1) + f u v,1),(0,0)]
            where u = s!(i-1)
                  v = t!(j-1)
          inf = -1073741824
    score :: Int
    score = maximum $ elems $ fst table
    best :: (String, String, String, String)
    best = (drop si $ take ei s',drop sj $ take ej t',b1,b2)
      where (a,d') = table
            (si,sj,b1,b2) = build ei ej [] []
            (ei,ej) = snd $ maximum $ map swap $ assocs a
            build x y ss tt
             | o == 0       = (x,y,ss,tt)
             | d == 1       = build (x-1) (y-1) (u:ss) (v:tt) 
             | d == 2       = build (x-1) y     (u:ss) ('-':tt) 
             | otherwise    = build x (y-1)     ('-':ss) (v:tt) 
             where o = a!(x,y)
                   d = d'!(x,y)
                   u = s!(x-1)
                   v = t!(y-1)
like image 565
Chao Xu Avatar asked Dec 12 '12 20:12

Chao Xu


2 Answers

is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.

Yes, of course. Use the same data structures and the same algorithms, and you will get same (or better, or worse, by constant factors) performance.

You are using (intermediate) lists and boxed arrays heavily. Consider using the vector package instead.

like image 55
Don Stewart Avatar answered Nov 19 '22 18:11

Don Stewart


You might be interested in the MemoCombinators library, which makes doing dynamic programming much easier. You can basically write the algorithm without memoization, then just annotate which variables you want memoized, and the compiler takes it from there.

like image 29
Mike Izbicki Avatar answered Nov 19 '22 17:11

Mike Izbicki