Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance issue

I am working with times series, type TSeries = [(Day, Double)], but need to translate the first Day element into Double for further processing (eg plots, etc...).

Mapping the date range to a corresponding Double range [lobound, upbound], where the earliest date maps to lobound and the latest date maps to upbound, is a basic transformation. In order to implement it I need first to get the minimum and maximum of the date range. I ran into performance issue, but I am not sure exactly why and how to fix it.

Here is the code (the time-series is not assumed to be sorted):

module Main where

import Data.Time (Day, fromGregorian, diffDays)

type TSeries = [(Day, Double)]

-- time-series to (Double, Double) mapping function
toDbl :: (Day -> Double) -> TSeries -> [(Double, Double)]
toDbl mapX ts = map (\(d,x) -> (mapX d, x)) ts

-- Day to Double mapping function - fast
mapDays1 :: (Day, Double) -> (Day, Double) -> Day -> Double
mapDays1 (d0,x0) (d1,x1) d = ((fromIntegral $ diffDays d d0) * x1 + (fromIntegral $ diffDays d1 d) * x0) / diff10
    where diff10 = fromIntegral $ diffDays d1 d0

-- Day to Double mapping function - slow
mapDays2 :: TSeries -> Double -> Double -> Day -> Double
mapDays2 ts x0 x1 d = mapDays1 (d0,x0) (d1,x1) d
    where d0 = minimum $ map fst ts
          d1 = maximum $ map fst ts

-- example time-series
func :: Int -> Double
func d = sin $ pi / 14 * (fromIntegral d)
ts = [(fromGregorian y m d, func d) | y <- [2000..2016], m <- [1..12], d <- [1..28]] :: TSeries

-- speed test
main = do
    let mindate = minimum $ map fst ts
        maxdate = maximum $ map fst ts
        test1 = toDbl (mapDays1 (mindate,0.0) (maxdate,100.0)) ts
        test2 = toDbl (mapDays2 ts 0.0 100.0) ts

    -- print $ sum $ map fst test1 -- this is fast
    print $ sum $ map fst test2 -- this is slow

The test that I perform (summing the X axis, first, elements) is not relevant but it is simple and illustrates the performance issue well.

Essentially mapDays1 and mapDays2 are the same except that in order to get the proper scaling I need to compute the minimum and maximum dates externally and pass them to mapDays1, while this is done "internally" within mapDays2.

The issue is that mapDays2 is very slow compared to the mapDays1 version. I suspect that the minimum and maximum calculations are called many times (as opposed to once), but I do not understand why and I am not sure how to fix mapDays2 to get a performance similar to mapDays1.

like image 960
Janthelme Avatar asked Nov 29 '16 08:11

Janthelme


People also ask

What are performance issues in the organization?

An employee performance issue is when an employee does not meet specific requirements that a job entails, such as attendance, policy objectives and standards to uphold an organization's culture. Usually, performance issues are outlined by an employee's manager directly and documented by human resources afterward.


1 Answers

The problem is indeed related to memoization. The problem is that you call mapDays1 and mapDays2 without passing them all their arguments, so those calls only create thunks.

The problem

That means that the thunks only get completed inside the map, so the different calls to mapDays2 can't share their results for d0 = minimum $ map fst ts and d1 = maximum $ map fst ts and the maximum and minimum get re-evaluated every time. One could imagine a situation where d0 and d1 depending on the last Day argument, in which case it would be incorrect not to re-evaluate d0 and d1 every time.

By contrast, it should be quite clear that mindate = minimum $ map fst ts and maxdate = maximum $ map fst ts need to be calculated only once.

Fixing mapDays2

Although we like to pretend that f x y = e is the same as f x = \y -> e, it isn't under the hood. You want GHC to avoid making a thunk when passed all but the last argument. Just move the d over the equal sign. Then, the function you return will calculate only once d0 and d1:

-- Day to Double mapping function - slow
mapDays2 :: TSeries -> Double -> Double -> Day -> Double
mapDays2 ts x0 x1 = \d -> mapDays1 (d0,x0) (d1,x1) d
    where d0 = minimum $ map fst ts
          d1 = maximum $ map fst ts
like image 52
Alec Avatar answered Oct 19 '22 03:10

Alec