Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two association lists with running accumulator

Tags:

haskell

Basically, I would describe this as a union/merge of [(,)] combined with a running accummulator on snd of the pair... Is there an elegant way to implement this?

(Please reference my code only in the context of answering the question. If you want to review my code, that would be great too, but please do that on this other site: https://codereview.stackexchange.com/questions/54993/merging-time-series)

A time series,

data Model a where
  Variant  :: [(Day, a)] -> Model a
  deriving (Show)

... where type a in [(Day, a)] basically represents the "total balance" e.g. bank account.

Some example data,

day1 = fromGregorian 1987 10 17
day2 = fromGregorian 1987 10 18
day3 = fromGregorian 1987 10 19
day4 = fromGregorian 1987 10 20
day5 = fromGregorian 1987 10 21
day6 = fromGregorian 1987 10 22

m1 = Variant [(day1, 1), (day3, 3), (day5, 5)] :: Model Integer
m2 = Variant [(day1, 1), (day2, 2), (day4, 4), (day6, 6)] :: Model Integer

Now, merge two time series such that the "total balance" is additive,

(&+) :: Num a => Model a -> Model a -> Model a
(Variant a) &+ (Variant b) = Variant $ reverse $ fst $ go a b ([],0)
  where
    go             []             [] (xs, c) = (xs, c)
    go   ((da,va):as)             [] (xs, c) = go as [] (((da,va+c):xs), va+c)
    go             []   ((db,vb):bs) (xs, c) = go [] bs (((db,vb+c):xs), vb+c)
    go a@((da,va):as) b@((db,vb):bs) (xs, c)
      | da > db  = go  a bs (((db,vb+c):xs), vb+c)
      | da < db  = go as  b (((da,va+c):xs), va+c)
      | da == db = go as bs (((da,va+vb+c):xs), va+vb+c)

So,

what = m1 &+ m2

Variant [(1987-10-17,2),(1987-10-18,4),(1987-10-19,7),(1987-10-20,11),(1987-10-21,16),(1987-10-22,22)]
like image 312
brooksbp Avatar asked Jun 23 '14 04:06

brooksbp


2 Answers

The moment I saw reverse I felt there might be trouble. Here's a version which is lazier and works on infinite values. It does depend upon each of its inputs as being sorted by Day, however. First we seek to merge the two streams

merge :: Num a => Model a -> Model a -> Model a
merge (Variant xs) (Variant ys) = Variant (go xs ys) where
  go [] ys = ys
  go xs [] = xs
  go xx@((dx, vx):xs) yy@((dy, vy):ys)
    | dx < dy   = (dx, vx)      : go xs yy
    | dx > dy   = (dy, vy)      : go xx ys
    | otherwise = (dx, vx + vy) : go xs ys

It's basically the core of what you had, but much simpler. Typically, if you can make a computation lazy in Haskell then it's worth the effort as it may be more efficient. After this point, we'll accumulate

accum :: Num a => Model a -> Model a
accum (Variant xs) = Variant (go xs 0) where
  go []          _ = []
  go ((d, v):xs) c = let c' = v + c in (d, c') : go xs c'

And then combining these two we get the desired result

-- (&+) :: Num a => Model a -> Model a -> Model a
-- a &+ b = accum (merge a b)

Though, it might be better to leave merge and accum as the exposed API since they can be combined in many ways other than just (&+).


It may be worth noting that the obvious way to write the accum function as a right fold

accum' :: Num a => Model a -> Model a
accum' (Variant xs) = Variant (snd $ foldr go (0, []) xs) where
  go (d, v) (c, res) = let c' = v + c in (c', (d, c'):res)

doesn't work because it accumulates the parameter from the rear of the list. Trying the left fold works but we have to reverse the list—a double sin against laziness.

accum'' :: Num a => Model a -> Model a
accum'' (Variant xs) = Variant (reverse $ snd $ foldl go (0, []) xs) where
  go (d, v) (c, res) = let c' = v + c in (c', (d, c'):res)

which gives a hint as to what was happening in the original version. We can write this as a right fold, however, but we have to be a little tricky in order to pass the accumulator in the right direction

accum' :: Num a => Model a -> Model a
accum' (Variant xs) = Variant (foldr go (const []) xs 0) where
  go (d, v) rest c = let c' = v + c in (d, c') : rest c'

Note here that the result of foldr go (const []) xs is a value of type a -> [a].

like image 199
J. Abrahamson Avatar answered Nov 15 '22 08:11

J. Abrahamson


The association list here is actually a "red herring". This is really a more general question of how to do a merge using a function to combine terms with equal keys. The association list version is the same question, but with a pre-applied Schwartzian transform.

Stated this way, we want a function with this type:

mergeCombineWith :: (a -> a -> Ordering) -> (a -> a -> a) -> [a] -> [a] -> [a]

where the first parameter defines the ordering and the second parameter is the combining function to be applied to elements with equal keys. We assume that the input lists are pre-sorted. If we also assume that neither of the input lists has any duplicate keys, or that we also want to combine duplicates within the same input list, then the solution is simple. Given a traditional merge function, of type:

mergeWith :: (a -> a -> Ordering) -> [a] -> [a] -> [a]

then our desired function is obtained by grouping the results of the traditional merge:

mergeCombineWith cmp comb xs ys =
    map combs . groupBy eq $ mergeWith cmp xs ys
  where
    combs = foldr1 comb
    eq x y = isEQ $ cmp x y
    isEQ EQ = True
    isEQ _  = False

More generally, it would be interesting to consider merging many lists, not just two. That could be done in a straightforward way using a fold:

multiMergeCombineWith :: (a -> a -> Ordering) -> (a -> a -> a) -> [[a]] -> [a]
multiMergeCombineWith cmp comb = foldr1 $ mergeCombineWith cmp comb

But that solution would be inefficient if there are many lists to combine. A better way would be to put the lists into a priority queue and always examine first the lists whose first elements are smallest in the given ordering. There are several good priority queue implementations on Hackage.

However, once again, if you have a solution to the multi-list problem for a traditional merge, you don't need to re-invent the wheel. First do the traditional merge, then group and combine, as above.

Thanks to Daniel Wagner for pointing out to me that versions of the two traditional merge functions can be found in the data-ordlist package on Hackage, called there mergeBy and mergeAllBy.

EDIT: A new priority queue implementation was recently published on Hackage.See the discussion about it in this reddit thread.

like image 45
Yitz Avatar answered Nov 15 '22 08:11

Yitz