Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to take an array slice with Repa over a range

I am attempting to implement a cumulative sum function using Repa in order to calculate integral images. My current implementation looks like the following:

cumsum :: (Elt a, Num a) => Array DIM2 a -> Array DIM2 a
cumsum array = traverse array id cumsum' 
    where
        elementSlice inner outer = slice array (inner :. (0 :: Int))
        cumsum' f (inner :. outer) = Repa.sumAll $ elementSlice inner outer

The problem is in the elementSlice function. In matlab or say numpy this could be specified as array[inner,0:outer]. So what I am looking for is something along the lines of:

slice array (inner :. (Range 0 outer))

However, it seems that slices are not allowed to be specified over ranges currently in Repa. I considered using partitions as discussed in Efficient Parallel Stencil Convolution in Haskell, but this seems like a rather heavyweight approach if they would be changed every iteration. I also considered masking the slice (multiplying by a binary vector) - but this again seemed like it would perform very poorly on large matrices since I would be allocating a mask vector for every point in the matrix...

My question - does anyone know if there are plans to add support for slicing over a range to Repa? Or is there a performant way I can already attack this problem, maybe with a different approach?

like image 413
nightski Avatar asked May 29 '11 19:05

nightski


2 Answers

Extracting a subrange is an index space manipulation that is easy enough to express with fromFunction, though we should probably add a nicer wrapper for it to the API.

let arr = fromList (Z :. (5 :: Int)) [1, 2, 3, 4, 5 :: Int] 
in  fromFunction (Z :. 3) (\(Z :. ix) -> arr ! (Z :. ix + 1))

> [2,3,4]

Elements in the result are retrieved by offsetting the provided index and looking that up from the source. This technique extends naturally to arrays of higher rank.

With respect to implementing parallel folds and scans, we would do this by adding a primitive for it to the library. We can't define parallel reductions in terms of map, but we can still use the overall approach of delayed arrays. It would be a reasonably orthogonal extension.

like image 91
Ben Lippmeier Avatar answered Sep 27 '22 21:09

Ben Lippmeier


Actually, I think the main problem is that Repa does not have a scan primitive. (However, a very similar library Accelerate does.) There are two variants of scan, prefix scan and suffix scan. Given a 1D array

[a_1, ..., a_n]

prefix scan returns

[0, a_0, a_0 + a_1, ..., a_0 + ... + a_{n-1} ]

while a suffix scan produces

[a_0, a_0 + a_1, ..., a_0 + a_1 + ... + a_n ]

I assume this is what you are going for with your cumulative sum (cumsum) function.

Prefix and suffix scans generalise quite naturally to multi-dimensional arrays and have an efficient implementation based on tree reduction. A relatively old paper on the topic is "Scan Primitives for Vector Computers". Also, Conal Elliott has recently written several blog posts on deriving efficient parallel scans in Haskell.

The integral image (on a 2D array) can be calculated by doing two scans, one horizontally and one vertically. In the absence of a scan primitive I have implemented one, very inefficiently.

horizScan :: Array DIM2 Int -> Array DIM2 Int
horizScan arr = foldl addIt arr [0 .. n - 1]
  where 
    addIt :: Array DIM2 Int -> Int -> Array DIM2 Int
    addIt accum i = accum +^ vs
       where 
         vs = toAdd (i+1) n (slice arr (Z:.All:.i))
    (Z:.m:.n) = arrayExtent arr

--
-- Given an @i@ and a length @len@ and a 1D array @arr@ 
-- (of length n) produces a 2D array of dimensions n X len.
-- The columns are filled with zeroes up to row @i@.
-- Subsequently they are filled with columns equal to the 
-- values in @arr.
--
toAdd :: Int -> Int -> Array DIM1 Int -> Array DIM2 Int
toAdd i len arr = traverse arr (\sh -> sh:.(len::Int)) 
               (\_ (Z:.n:.m) -> if m >= i then arr ! (Z:.n) else 0) 

The function for calculating the integral image can then be defined as

vertScan :: Array DIM2 Int -> Array DIM2 Int
vertScan = transpose . horizScan . transpose

integralImage = horizScan . vertScan

Given that scan has been implemented for Accelerate, it shouldn't be too hard to add it to Repa. I am not sure if an efficient implementation using the existing Repa primitives is possible or not.

like image 21
Sean Seefried Avatar answered Sep 27 '22 19:09

Sean Seefried