Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing the mean of a list efficiently in Haskell

I've designed a function to compute the mean of a list. Although it works fine, but I think it may not be the best solution due to it takes two functions rather than one. Is it possible to do this job done with only one recursive function ?

calcMeanList (x:xs) = doCalcMeanList (x:xs) 0 0

doCalcMeanList (x:xs) sum length =  doCalcMeanList xs (sum+x) (length+1)
doCalcMeanList [] sum length = sum/length
like image 402
snowmantw Avatar asked Jul 21 '10 15:07

snowmantw


1 Answers

About the best you can do is this version:

import qualified Data.Vector.Unboxed as U

data Pair = Pair {-# UNPACK #-}!Int {-# UNPACK #-}!Double

mean :: U.Vector Double -> Double
mean xs = s / fromIntegral n
  where
    Pair n s       = U.foldl' k (Pair 0 0) xs
    k (Pair n s) x = Pair (n+1) (s+x)

main = print (mean $ U.enumFromN 1 (10^7))

It fuses to an optimal loop in Core (the best Haskell you could write):

main_$s$wfoldlM'_loop :: Int#
                              -> Double#
                              -> Double#
                              -> Int#
                              -> (# Int#, Double# #)    
main_$s$wfoldlM'_loop =
  \ (sc_s1nH :: Int#)
    (sc1_s1nI :: Double#)
    (sc2_s1nJ :: Double#)
    (sc3_s1nK :: Int#) ->
    case ># sc_s1nH 0 of _ {
      False -> (# sc3_s1nK, sc2_s1nJ #);
      True ->
        main_$s$wfoldlM'_loop
          (-# sc_s1nH 1)
          (+## sc1_s1nI 1.0)
          (+## sc2_s1nJ sc1_s1nI)
          (+# sc3_s1nK 1)
    }

And the following assembly:

Main_mainzuzdszdwfoldlMzqzuloop_info:
.Lc1pN:
        testq %r14,%r14
        jg .Lc1pQ
        movq %rsi,%rbx
        movsd %xmm6,%xmm5
        jmp *(%rbp)
.Lc1pQ:
        leaq 1(%rsi),%rax
        movsd %xmm6,%xmm0
        addsd %xmm5,%xmm0
        movsd %xmm5,%xmm7
        addsd .Ln1pS(%rip),%xmm7
        decq %r14
        movsd %xmm7,%xmm5
        movsd %xmm0,%xmm6
        movq %rax,%rsi
        jmp Main_mainzuzdszdwfoldlMzqzuloop_info

Based on Data.Vector. For example,

$ ghc -Odph --make A.hs -fforce-recomp
[1 of 1] Compiling Main             ( A.hs, A.o )
Linking A ...
$ time ./A
5000000.5
./A  0.04s user 0.00s system 93% cpu 0.046 total

See the efficient implementations in the statistics package.

like image 101
Don Stewart Avatar answered Nov 15 '22 18:11

Don Stewart