Basically, I would describe this as a union/merge of [(,)]
combined with a running accummulator on snd
of the pair... Is there an elegant way to implement this?
(Please reference my code only in the context of answering the question. If you want to review my code, that would be great too, but please do that on this other site: https://codereview.stackexchange.com/questions/54993/merging-time-series)
A time series,
data Model a where
Variant :: [(Day, a)] -> Model a
deriving (Show)
... where type a
in [(Day, a)]
basically represents the "total balance" e.g. bank account.
Some example data,
day1 = fromGregorian 1987 10 17
day2 = fromGregorian 1987 10 18
day3 = fromGregorian 1987 10 19
day4 = fromGregorian 1987 10 20
day5 = fromGregorian 1987 10 21
day6 = fromGregorian 1987 10 22
m1 = Variant [(day1, 1), (day3, 3), (day5, 5)] :: Model Integer
m2 = Variant [(day1, 1), (day2, 2), (day4, 4), (day6, 6)] :: Model Integer
Now, merge two time series such that the "total balance" is additive,
(&+) :: Num a => Model a -> Model a -> Model a
(Variant a) &+ (Variant b) = Variant $ reverse $ fst $ go a b ([],0)
where
go [] [] (xs, c) = (xs, c)
go ((da,va):as) [] (xs, c) = go as [] (((da,va+c):xs), va+c)
go [] ((db,vb):bs) (xs, c) = go [] bs (((db,vb+c):xs), vb+c)
go a@((da,va):as) b@((db,vb):bs) (xs, c)
| da > db = go a bs (((db,vb+c):xs), vb+c)
| da < db = go as b (((da,va+c):xs), va+c)
| da == db = go as bs (((da,va+vb+c):xs), va+vb+c)
So,
what = m1 &+ m2
Variant [(1987-10-17,2),(1987-10-18,4),(1987-10-19,7),(1987-10-20,11),(1987-10-21,16),(1987-10-22,22)]
The moment I saw reverse
I felt there might be trouble. Here's a version which is lazier and works on infinite values. It does depend upon each of its inputs as being sorted by Day
, however. First we seek to merge
the two streams
merge :: Num a => Model a -> Model a -> Model a
merge (Variant xs) (Variant ys) = Variant (go xs ys) where
go [] ys = ys
go xs [] = xs
go xx@((dx, vx):xs) yy@((dy, vy):ys)
| dx < dy = (dx, vx) : go xs yy
| dx > dy = (dy, vy) : go xx ys
| otherwise = (dx, vx + vy) : go xs ys
It's basically the core of what you had, but much simpler. Typically, if you can make a computation lazy in Haskell then it's worth the effort as it may be more efficient. After this point, we'll accumulate
accum :: Num a => Model a -> Model a
accum (Variant xs) = Variant (go xs 0) where
go [] _ = []
go ((d, v):xs) c = let c' = v + c in (d, c') : go xs c'
And then combining these two we get the desired result
-- (&+) :: Num a => Model a -> Model a -> Model a
-- a &+ b = accum (merge a b)
Though, it might be better to leave merge
and accum
as the exposed API since they can be combined in many ways other than just (&+)
.
It may be worth noting that the obvious way to write the accum
function as a right fold
accum' :: Num a => Model a -> Model a
accum' (Variant xs) = Variant (snd $ foldr go (0, []) xs) where
go (d, v) (c, res) = let c' = v + c in (c', (d, c'):res)
doesn't work because it accumulates the parameter from the rear of the list. Trying the left fold works but we have to reverse the list—a double sin against laziness.
accum'' :: Num a => Model a -> Model a
accum'' (Variant xs) = Variant (reverse $ snd $ foldl go (0, []) xs) where
go (d, v) (c, res) = let c' = v + c in (c', (d, c'):res)
which gives a hint as to what was happening in the original version. We can write this as a right fold, however, but we have to be a little tricky in order to pass the accumulator in the right direction
accum' :: Num a => Model a -> Model a
accum' (Variant xs) = Variant (foldr go (const []) xs 0) where
go (d, v) rest c = let c' = v + c in (d, c') : rest c'
Note here that the result of foldr go (const []) xs
is a value of type a -> [a]
.
The association list here is actually a "red herring". This is really a more general question of how to do a merge using a function to combine terms with equal keys. The association list version is the same question, but with a pre-applied Schwartzian transform.
Stated this way, we want a function with this type:
mergeCombineWith :: (a -> a -> Ordering) -> (a -> a -> a) -> [a] -> [a] -> [a]
where the first parameter defines the ordering and the second parameter is the combining function to be applied to elements with equal keys. We assume that the input lists are pre-sorted. If we also assume that neither of the input lists has any duplicate keys, or that we also want to combine duplicates within the same input list, then the solution is simple. Given a traditional merge function, of type:
mergeWith :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
then our desired function is obtained by grouping the results of the traditional merge:
mergeCombineWith cmp comb xs ys =
map combs . groupBy eq $ mergeWith cmp xs ys
where
combs = foldr1 comb
eq x y = isEQ $ cmp x y
isEQ EQ = True
isEQ _ = False
More generally, it would be interesting to consider merging many lists, not just two. That could be done in a straightforward way using a fold:
multiMergeCombineWith :: (a -> a -> Ordering) -> (a -> a -> a) -> [[a]] -> [a]
multiMergeCombineWith cmp comb = foldr1 $ mergeCombineWith cmp comb
But that solution would be inefficient if there are many lists to combine. A better way would be to put the lists into a priority queue and always examine first the lists whose first elements are smallest in the given ordering. There are several good priority queue implementations on Hackage.
However, once again, if you have a solution to the multi-list problem for a traditional merge, you don't need to re-invent the wheel. First do the traditional merge, then group and combine, as above.
Thanks to Daniel Wagner for pointing out to me that versions of the two traditional merge functions can be found in the data-ordlist package on Hackage, called there mergeBy and mergeAllBy.
EDIT: A new priority queue implementation was recently published on Hackage.See the discussion about it in this reddit thread.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With