I am working with times series, type TSeries = [(Day, Double)]
, but need to translate the first Day
element into Double for further processing (eg plots, etc...).
Mapping the date range to a corresponding Double range [lobound, upbound], where the earliest date maps to lobound and the latest date maps to upbound, is a basic transformation. In order to implement it I need first to get the minimum and maximum of the date range. I ran into performance issue, but I am not sure exactly why and how to fix it.
Here is the code (the time-series is not assumed to be sorted):
module Main where
import Data.Time (Day, fromGregorian, diffDays)
type TSeries = [(Day, Double)]
-- time-series to (Double, Double) mapping function
toDbl :: (Day -> Double) -> TSeries -> [(Double, Double)]
toDbl mapX ts = map (\(d,x) -> (mapX d, x)) ts
-- Day to Double mapping function - fast
mapDays1 :: (Day, Double) -> (Day, Double) -> Day -> Double
mapDays1 (d0,x0) (d1,x1) d = ((fromIntegral $ diffDays d d0) * x1 + (fromIntegral $ diffDays d1 d) * x0) / diff10
where diff10 = fromIntegral $ diffDays d1 d0
-- Day to Double mapping function - slow
mapDays2 :: TSeries -> Double -> Double -> Day -> Double
mapDays2 ts x0 x1 d = mapDays1 (d0,x0) (d1,x1) d
where d0 = minimum $ map fst ts
d1 = maximum $ map fst ts
-- example time-series
func :: Int -> Double
func d = sin $ pi / 14 * (fromIntegral d)
ts = [(fromGregorian y m d, func d) | y <- [2000..2016], m <- [1..12], d <- [1..28]] :: TSeries
-- speed test
main = do
let mindate = minimum $ map fst ts
maxdate = maximum $ map fst ts
test1 = toDbl (mapDays1 (mindate,0.0) (maxdate,100.0)) ts
test2 = toDbl (mapDays2 ts 0.0 100.0) ts
-- print $ sum $ map fst test1 -- this is fast
print $ sum $ map fst test2 -- this is slow
The test that I perform (summing the X axis, first, elements) is not relevant but it is simple and illustrates the performance issue well.
Essentially mapDays1 and mapDays2 are the same except that in order to get the proper scaling I need to compute the minimum and maximum dates externally and pass them to mapDays1, while this is done "internally" within mapDays2.
The issue is that mapDays2 is very slow compared to the mapDays1 version. I suspect that the minimum and maximum calculations are called many times (as opposed to once), but I do not understand why and I am not sure how to fix mapDays2 to get a performance similar to mapDays1.
An employee performance issue is when an employee does not meet specific requirements that a job entails, such as attendance, policy objectives and standards to uphold an organization's culture. Usually, performance issues are outlined by an employee's manager directly and documented by human resources afterward.
The problem is indeed related to memoization. The problem is that you call mapDays1
and mapDays2
without passing them all their arguments, so those calls only create thunks.
That means that the thunks only get completed inside the map
, so the different calls to mapDays2
can't share their results for d0 = minimum $ map fst ts
and d1 = maximum $ map fst ts
and the maximum and minimum get re-evaluated every time. One could imagine a situation where d0
and d1
depending on the last Day
argument, in which case it would be incorrect not to re-evaluate d0
and d1
every time.
By contrast, it should be quite clear that mindate = minimum $ map fst ts
and maxdate = maximum $ map fst ts
need to be calculated only once.
mapDays2
Although we like to pretend that f x y = e
is the same as f x = \y -> e
, it isn't under the hood. You want GHC to avoid making a thunk when passed all but the last argument. Just move the d
over the equal sign. Then, the function you return will calculate only once d0
and d1
:
-- Day to Double mapping function - slow
mapDays2 :: TSeries -> Double -> Double -> Day -> Double
mapDays2 ts x0 x1 = \d -> mapDays1 (d0,x0) (d1,x1) d
where d0 = minimum $ map fst ts
d1 = maximum $ map fst ts
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With