To enforce my (weak) functional programming skills, I am studying The NURBS book by Piegl and Tiller converting all the algorithms to Haskell. It is a very nice and instructive process but I got stuck on algorithm 2.2, here is (a C-like reworked-by-me version of) the pseudo-code:
double[] BasisFuns( int i, double u, int p, double U[]) { double N[p+1]; double left[p+1]; double ritght[p+1]; N[0]=1.0; for (j=1; j<=p; j++) { left[j] = u - U[i+1-j]; right[j] = U[i+j] - u; double saved = 0.0; for (r=O; r<j; r++) { double temp= N[r]/(right[r+1]+left[j-r]); N[r] = saved+right[r+1]*temp; saved = left[j-r]*temp; } N[j] = saved; } return N; }
The outer loop looks easy, but the inner one, with all those necessarily-ordered modifications to the elements of N
is giving me a headache.
I started to set it up like this:
baseFunc :: RealFrac a => Int -> Int -> a -> [a] -> [a] baseFunc i p u knots [ ??? | j <- [1..p], r <- [0..j] ] where left = [ u - knots !! (i+1-j) | j <- [ 1 .. p ] right= [ knots !! (i+j) - u | j <- [ 1 .. p ]
but I feel that I may be completely off road.
I have already written a completely different and inefficient version of this function based on Eq. 2.5 in the book, therefore here I am looking to maintain the performance of the imperative version.
Although it is certainly possible to translate numerical algorithms into Haskell starting from Fortran/Matlab/C “everything is an array” style (and, using unboxed mutable vectors, performance will generally be not much worse) this is really missing the point about using a functional language. The underlying math is actually much closer to functional than to imperative programming, so the best thing is to start right there. Specifically, the recurrence formula
can be translated almost literally into Haskell, much better than into an imperative language:
baseFuncs :: [Double] -- ^ Knots, \(\{u_i\}_i\) -> Int -- ^ Index \(i\) at which to evaluate -> Int -- ^ Spline degree \(p\) -> Double -- ^ Position \(u\) at which to evaluate -> Double baseFuncs us i 0 u | u >= us!!i, u < us!!(i+1) = 1 | otherwise = 0 baseFuncs us i p u = (u - us!!i)/(us!!(i+p) - us!!i) * baseFuncs us i (p-1) u + (us!!(i+p+1) - u)/(us!!(i+p+1) - us!!(i+1)) * baseFuncs us (i+1) (p-1) u
Unfortunately this will actually not be efficient though, for multiple reasons.
First, lists suck at random-access. A simple fix is to switch to unboxed (but pure) vectors. While we're at is let's wrap them in a newtype because the ui are supposed to be strictly increasing. Talking about types: the direct accesses are unsafe; we could fix this by bringing p
and the number of segments to the type level and only allowing indices i < n-p
, but I won't go into that here.
Also, it's awkward to pass us
and u
around all the way down the recursion, better just bind it once and then use a helper function to go down:
import Data.Vector.Unboxed (Vector, (!)) import qualified Data.Vector.Unboxed as VU newtype Knots = Knots {getIncreasingKnotsSeq :: Vector Double} baseFuncs :: Knots -- ^ \(\{u_i\}_i\) -> Int -- ^ Index \(i\) at which to evaluate -> Int -- ^ Spline degree \(p\) -> Double -- ^ Position \(u\) at which to evaluate -> Double baseFuncs (Knots us) i₀ p₀ u = go i₀ p₀ where go i 0 | u >= us!i , i>=VU.length us-1 || u < us!(i+1) = 1 | otherwise = 0 go i p = (u - us!i)/(us!(i+p) - us!i) * go i (p-1) + (us!(i+p+1) - u)/(us!(i+p+1) - us!(i+1)) * go (i+1) (p-1)
The other thing that's not optimal is that we don't share the lower-level evaluations between neighbouring recursive calls. (The evaluation is effectively spanning a directed graph with p2⁄2 nodes, but we evaluate it as a tree with 2p nodes.) That's a massive inefficiency for large p, but actually quite harmless for the typical low-degree splines.
The way to avoid this inefficiency is to memoise. The C version does this explicitly with the N
array, but – this being Haskell – we can be lazy to save the effort of allocating the correct size by using a generic memoisation library, e.g. memo-trie:
import Data.MemoTrie (memo2) baseFuncs (Knots us) i₀ p₀ u = go' i₀ p₀ where go i 0 | u >= us!i , i>=VU.length us || u < us!(i+1) = 1 | otherwise = 0 go i p = (u - us!i)/(us!(i+p) - us!i) * go' i (p-1) + (us!(i+p+1) - u)/(us!(i+p+1) - us!(i+1)) * go' (i+1) (p-1) go' = memo2 go
That was the no-brains version (“just memoise the entire domain of go
”). As dfeuer remarks, it is easy enough to explicitly memoise only the regian that actually gets evaluated, and then we can again use an efficient unboxed vector:
baseFuncs (Knots us) i₀ p₀ u = VU.unsafeHead $ gol i₀ p₀ where gol i 0 = VU.generate (p₀+1) $ \j -> if u >= us!(i+j) && (i+j>=VU.length us || u < us!(i+j+1)) then 1 else 0 gol i p = case gol i (p-1) of res' -> VU.izipWith (\j l r -> let i' = i+j in (u - us!i')/(us!(i'+p) - us!i') * l + (us!(i'+p+1) - u)/(us!(i'+p+1) - us!(i'+1)) * r) res' (VU.unsafeTail res')
(I can safely use unsafeHead
and unsafeTail
here, because at each recursion level the zipping reduces the length by 1, so at the top-level I still have p₀ - (p₀-1) = 1
elements left.)
This version should, I think, have the same asymptotics as the C version. With some more small improvements like precomputing the interval lengths and pre-checking that the arguments are in the allowed range so all accesses can be made unsafe
, it is probably very close in performance to the C version.
As – again – dfeuer remarks, it might not even be necessary to use vectors there because I just zip together the result. For this kind of stuff, GHC can be very good at optimising code even when using plain lists. But, I won't investigate the performance any further here.
The test I used to confirm it actually works:
https://gist.github.com/leftaroundabout/4fd6ef8642029607e1b222783b9d1c1e
(Disclaimer: I have zero idea of what is being calculated here.)
The access pattern in the array U
seems to be as follows: from the index i
outwards, we consume values to the left and to the right. We could imagine having two lists which consisted precisely in those sequences of elements. In fact, we could construct such lists out of a source list like this:
pryAt :: Int -> [a] -> ([a], [a]) pryAt i xs = go i ([], xs) where go 0 a = a go n (us, v : vs) = go (pred n) (v : us, vs) -- pryAt 5 ['a'..'x'] -- ("edcba","fghijklmnopqrstuvwx")
For random-access containers, we could have specialized versions of pryAt
, because traversing the whole list until we reach the i
th element will be inefficient.
In the outer loop, we have arrays N
, left
and right
which grow with each iteration (N
seems to be fully reconstructed at each iteration, as well). We could represent them as lists. In each iteration, we also consume a pair of elements of U
, to the left and right.
The following datatype represents the situation at the beginning of an iteration of the outer loop:
data State = State { ns :: [Double], left :: [Double], right :: [Double], us :: ([Double], [Double]) }
Assuming we already had the outerStep :: State -> State
implemented, we could simply turn the crank p
times:
basis :: Int -> Double -> Int -> [Double] -> [Double] basis i u p us = ns $ iterate outerStep initial !! p where initial = State { ns = [1.0], left = [], right = [], us = pryAt i us }
What is done at outerStep
? We add new elements to left
and right
, then we re-create the whole N
list from the beginning, while carrying a saved
accumulator along the way. This is a mapAccumR
. We need some extra info: the right
values (in the same direction as N
) and the left
values (in reverse direction) so we need to zip them beforehand:
outerStep (State {ns, left, right, us = (ul : uls, ur : urs)}) = let left' = u - ul : left right' = ur - u : right (saved', ns') = mapAccumR innerStep 0.0 $ zip3 ns right' (reverse left') in State { ns = saved' : ns', left = left', right = right', us = (uls, urs) }
And here are the computations of the inner step:
innerStep saved (n, r, l) = let temp = n / (r - l) n' = saved + r saved' = l * temp in (saved', n')
In addition to correcting possible bugs, more work would remain because the basis
function in its current form is likely to leak memory (in particular, that mapAccumR
will create lots of thunks). Perhaps it could be rewritten to use functions like iterate'
or foldl'
that keep their accumulators strict.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With