I implemented exponentially weighted moving average (ewma) in python3 and in Haskell (compiled). It takes about the same time. However when this function is applied twice, haskell version slows down unpredictably (more than 1000 times, whereas python version is only about 2 times slower).
Python3 version:
import numpy as np
def ewma_f(y, tau):
a = 1/tau
avg = np.zeros_like(y)
for i in range(1, len(y)):
avg[i] = a*y[i-1]+(1-a)*avg[i-1]
return avg
Haskell with lists:
ewmaL :: [Double] -> Double -> [Double]
ewmaL ys tau = reverse $ e (reverse ys) (1.0/tau)
where e [x] a = [a*x]
e (x:xs) a = (a*x + (1-a)*(head $ e xs a) : e xs a)
Haskell with arrays:
import qualified Data.Vector as V
ewmaV :: V.Vector Double -> Double -> V.Vector Double
ewmaV x tau = V.map f $ V.enumFromN 0 (V.length x)
where
f (-1) = 0
f n = (x V.! n)*a + (1-a)*(f (n-1))
a = 1/tau
In all cases, computation takes about the same time (tested for an array with 10000 elements). Haskell code was compiled without any flags, though "ghc -O2" didn't make any difference.
I used computed ewma to compute absolute deviation from this ewma; I then applied ewma function to this deviation.
Python3:
def ewmd_f(y, tau):
ewma = ewma_f(y, tau)
return ewma_f(np.abs(y-ewma), tau)
It runs twice longer compared to ewma.
Haskell with lists:
ewmdL :: [Double] -> Double -> [Double]
ewmdL xs tau = ewmaL devs tau
where devs = zipWith (\ x y -> abs $ x-y) xs avg
avg = (ewmaL xs tau)
Haskell with vectors:
ewmdV :: V.Vector Double -> Double -> V.Vector Double
ewmdV xs tau = ewmaV devs tau
where devs = V.zipWith (\ x y -> abs $ x-y) xs avg
avg = ewmaV xs tau
Both ewmd run > 1000 slower than their ewma counterparts.
I evaluated python3 code with:
from time import time
x = np.sin(np.arange(10000))
tau = 100.0
t1 = time()
ewma = ewma_f(x, tau)
t2 = time()
ewmd = ewmd_f(x, tau)
t3 = time()
print("EWMA took {} s".format(t2-t1))
print("EWMD took {} s".format(t3-t2))
I evaluated Haskell code with:
import System.CPUTime
timeIt f = do
start <- getCPUTime
end <- seq f getCPUTime
let d = (fromIntegral (end - start)) / (10^12) in
return (show d)
main = do
let n = 10000 :: Int
let tau = 100.0
let l = map sin [0.0..(fromIntegral $ n-1)]
let x = V.map sin $ V.enumFromN 0 n
putStrLn "Vectors"
aV <- timeIt $ V.last $ ewmaV x tau
putStrLn $ "EWMA (vector) took "++aV
dV <- timeIt $ V.last $ ewmdV x tau
putStrLn $ "EWMD (vector) took "++dV
putStrLn ""
putStrLn "Lists"
lV <- timeIt $ last $ ewmaL l tau
putStrLn $ "EWMA (list) took "++lV
lD <- timeIt $ last $ ewmdL l tau
putStrLn $ "EWMD (list) took "++lD
Your Python and Haskell algorithms may look equivalent, but they actually have different asymptotic complexity:
ewmaV x tau = V.map f $ V.enumFromN 0 (V.length x)
where
f (-1) = 0
f n = (x V.! n)*a + (1-a)
*(f (n-1)) -- Recursion!
a = 1/tau
This makes the Haskell implementation O (n²), which is inacceptable. The reason you don't notice this when only evaluating V.last . ewmaV
is lazyness: to evaluate the last element only, you don't really need to process the entire vector, instead you only get one recursion-loop across x
. OTOH, ewmdV
actually forces all of the elements, hence the extra cost.
One simple (but not optimal, I daresay) way to get around this is to memoise the result:
ewmaV :: V.Vector Double -> Double -> V.Vector Double
ewmaV x tau = result
where result = V.map f $ V.enumFromN 0 (V.length x)
f 0 = V.head x * a
f n = (x V.! n)*a + (1-a)*(result V.! (n-1))
a = 1/tau
Now ewmdV
takes ≈twice as long as ewmaV
:
sagemuej@sagemuej-X302LA:/tmp$ ghc wtmpf-file6122.hs -O2 && ./wtmpf-file6122
[1 of 1] Compiling Main ( wtmpf-file6122.hs, wtmpf-file6122.o )
Linking wtmpf-file6122 ...
Vectors
EWMA (vector) took 4.932e-3
EWMD (vector) took 7.758e-3
(Those timings aren't very reliable; for accurate performance tests use criterion.)
A better solution IMO would be to avoid this indexing business entirely – we're not writing Fortran, are we? IIRs like EWMA are better implemented in a purely “local” manner; this can be expressed nicely in Haskell with a state monad, so you're independent of what container the data ships in.
import Data.Traversable
import Control.Monad (forM)
import Control.Monad.State
ewma :: Traversable t => t Double -> Double -> t Double
ewma x tau = (`evalState`0) . forM x $
\xi -> state $ \carry
-> let yi = a*xi + (1-a)*carry
in (yi, yi)
where a = 1/tau
While we're at generalising: there's no reason to restrict this only to work with Double
data; you can filter any kind of variable that can be scaled and interpolated.
{-# LANGUAGE FlexibleContexts #-}
import Data.VectorSpace
ewma :: (Traversable t, VectorSpace v, Fractional (Scalar v))
=> t v -> Scalar v -> t v
ewma x tau = (`evalState`zeroV) . forM x $
\xi -> state $ \carry
-> let yi = a*^xi ^+^ (1-a)*^carry
in (yi, yi)
where a = 1/tau
This way, you can in principle use the same filter for motion-blurring video data stored in a lazily streamed infinite list of picture frames, as for lowpass-filtering a radio signal pulse stored in an unboxed Vector
. (VU.Vector
actually has no Traversable
instance; you need to substitute oforM
then.)
The following makes two recursive calls:
ewmaL ys tau = reverse $ e (reverse ys) (1.0/tau)
where e [x] a = [a*x]
e (x:xs) a = (a*x + (1-a)*(head $ e xs a) : e xs a)
We can make one recursive call, and use the result for both cases:
ewmaLcse :: [Double] -> Double -> [Double]
ewmaLcse ys tau = reverse $ e (reverse ys) (1.0/tau)
where e [x] a = [a*x]
e (x:xs) a = (a*x + (1-a)*(head zs) : zs)
where zs = e xs a
I also chose to benchmark the sum of the list, so to force all of it to be computed:
lV <- timeIt $ sum $ ewmaL l tau
putStrLn $ "EWMA (list) took "++lV
lVcse <- timeIt $ sum $ ewmaLcse l tau
putStrLn $ "EWMAcse (list) took "++lVcse
Results, with n=10000
Lists
EWMA (list) took 2.384
EWMAcse (list) took 0.0
with n=20000
Lists
EWMA (list) took 16.472
EWMAcse (list) took 4.0e-3
By the way, one can also use standard library loops for this specific recursion. Here I resorted to mapAccumL
: no need to double-reverse lists.
ewmaL2 :: [Double] -> Double -> [Double]
ewmaL2 ys tau = snd $ mapAccumL e 0 ys
where a = 1/tau
e !prevAvg !y = (nextAvg,nextAvg)
where !nextAvg = a*y+(1-a)*prevAvg
(Actually, the fact that I am using (nextAvg,nextAvg)
means that a simpler scanl
would also do the job. Oh, well...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With