Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell Repa stencil hacks

The Problem

I'm trying to understand how Repa works and I was plaing with a "blur" example code from Repa Examples package. The code uses stencil2 Quasi Quote:

[stencil2|   2  4  5  4  2
             4  9 12  9  4
             5 12 15 12  5
             4  9 12  9  4
             2  4  5  4  2 |]

Which is simply TemplateHaskell snippet, which generates a function:

makeStencil2 5 5 coeffs where
     {-# INLINE[~0] coeffs #-}
     coeffs = \ ix -> case ix of
                      Z :. -2 :. -2 -> Just 2
                      Z :. -2 :. -1 -> Just 4
                      Z :. -2 :. 0 -> Just 5
                      Z :. -2 :. 1 -> Just 4
                      Z :. -2 :. 2 -> Just 2
                      [...]
                      _ -> Nothing

Its ok to use TH, but I would love to keep the coefs in an Repa Array, so I've changed the code to use an Repa Array instead, but my code works 2 times slower comparing to the original one.

Some fancy notes

I've noticed, that Repa authors use hardcoded 7 by 7 matrix of values to get coefficients: http://hackage.haskell.org/package/repa-3.2.3.3/docs/src/Data-Array-Repa-Stencil-Dim2.html#forStencil2 (see: template7x7)

Questions

  1. I want to ask you why it is not optimised as original one and how can we fix it? I want to write a "convolve" function, which will allow me to run convolution of an stencil (a Repa Array) over an image.
  2. Do we really have to use such hardcoded matrixes to make GHC optimize the code? There is really no way to create fast Haskell code without using such "hacks"?

The Code

Original blur function:

blur    :: Monad m => Int -> Array U DIM2 Double -> m (Array U DIM2 Double)
blur !iterations arrInit
 = go iterations arrInit
 where  go !0 !arr = return arr
        go !n !arr  
         = do   arr'    <- computeP
                         $ A.smap (/ 159)
                         $ forStencil2 BoundClamp arr
                           [stencil2|   2  4  5  4  2
                                        4  9 12  9  4
                                        5 12 15 12  5
                                        4  9 12  9  4
                                        2  4  5  4  2 |]
                go (n-1) arr'

my blur function:

blur !iterations arrInit = go iterations arrInit
    where 
          stencilx7 = fromListUnboxed (Z :. 7 :. 7) 
                    [  0,  0,  0,  0,  0,  0, 0
                    ,  0,  2,  4,  5,  4,  2, 0
                    ,  0,  4,  9, 12,  9,  4, 0
                    ,  0,  5, 12, 15, 12,  5, 0
                    ,  0,  4,  9, 12,  9,  4, 0
                    ,  0,  2,  4,  5,  4,  2, 0
                    ,  0,  0,  0,  0,  0,  0, 0
                    ] :: Array U DIM2 Int
          magicf (Z :. x :. y) = Just $ fromIntegral $ unsafeIndex stencilx7 (Z:. (x+3) :. (y+3))
          go !0 !arr = return arr
          go !n !arr  
           = do   
                  arr'    <- computeP
                           $ A.smap (/ 159)
                           $ A.forStencil2 BoundClamp arr 
                            $ makeStencil2 5 5 magicf
                  go (n-1) arr'

Rest of the code:

{-# LANGUAGE PackageImports, BangPatterns, TemplateHaskell, QuasiQuotes #-}
{-# OPTIONS -Wall -fno-warn-missing-signatures -fno-warn-incomplete-patterns #-}

import Data.List
import Control.Monad
import System.Environment
import Data.Word
import Data.Array.Repa.IO.BMP
import Data.Array.Repa.IO.Timing
import Data.Array.Repa                          as A
import qualified Data.Array.Repa.Repr.Unboxed   as U
import Data.Array.Repa.Stencil                  as A
import Data.Array.Repa.Stencil.Dim2             as A
import Prelude                                  as P

main 
 = do   args    <- getArgs
        case args of
         [iterations, fileIn, fileOut]  -> run (read iterations) fileIn fileOut
         _                              -> usage

usage   = putStr $ unlines
        [ "repa-blur <iterations::Int> <fileIn.bmp> <fileOut.bmp>" ]


-- | Perform the blur.
run :: Int -> FilePath -> FilePath -> IO ()
run iterations fileIn fileOut
 = do   arrRGB  <- liftM (either (error . show) id) 
                $  readImageFromBMP fileIn

        arrRGB `deepSeqArray` return ()
        let (arrRed, arrGreen, arrBlue) = U.unzip3 arrRGB
        let comps                       = [arrRed, arrGreen, arrBlue]

        (comps', tElapsed)
         <- time $ P.mapM (process iterations) comps

        putStr $ prettyTime tElapsed

        let [arrRed', arrGreen', arrBlue'] = comps'
        writeImageToBMP fileOut
                (U.zip3 arrRed' arrGreen' arrBlue')


process :: Monad m => Int -> Array U DIM2 Word8 -> m (Array U DIM2 Word8)
process iterations 
        = promote >=> blur iterations >=> demote
{-# NOINLINE process #-}


promote :: Monad m => Array U DIM2 Word8 -> m (Array U DIM2 Double)
promote arr
 = computeP $ A.map ffs arr
 where  {-# INLINE ffs #-}
        ffs     :: Word8 -> Double
        ffs x   =  fromIntegral (fromIntegral x :: Int)
{-# NOINLINE promote #-}


demote  :: Monad m => Array U DIM2 Double -> m (Array U DIM2 Word8)
demote arr
 = computeP $ A.map ffs arr

 where  {-# INLINE ffs #-}
        ffs     :: Double -> Word8
        ffs x   =  fromIntegral (truncate x :: Int)

Compile with: ghc -O2 -threaded -fllvm -fforce-recomp Main.hs -ddump-splices

like image 520
Wojciech Danilo Avatar asked Nov 03 '13 02:11

Wojciech Danilo


1 Answers

  1. Reading convolution coeffs from the array theoretically couldn't be as fast as soldering constants right in the compiled code, because the latter approach costs nothing on machine level.

  2. No, GHC is able to boil down arbitrarily sized static stencils. See my implementation of static convolutions with fixed-vectors of lambdas:

    [dim2St| 1   2   1
             0   0   0
            -1  -2  -1 |]
    -->
    Dim2Stencil
     n3
     n3
     (VecList
        [VecList
           [\ acc a -> return (acc + a),
            \ acc a -> (return $ (acc + (2 * a))),
            \ acc a -> return (acc + a)],
         VecList
           [\ acc _ -> return acc,
            \ acc _ -> return acc,
            \ acc _ -> return acc],
         VecList
           [\ acc a -> return (acc - a),
            \ acc a -> (return $ (acc + (-2 * a))),
            \ acc a -> return (acc - a)]])
     (\ acc a reduce -> reduce acc a)
     (return 0)
    
like image 83
leventov Avatar answered Oct 14 '22 18:10

leventov