Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Canonical outer join zip function

If you consider the (implicit) indexes of each element of a list as their keys, then zipWith is sort of like a relational inner join. It only processes the keys for which both inputs have values:

zipWith (+) [1..5] [10..20] == zipWith (+) [1..11] [10..14] == [11,13,15,17,19] 

Is there a canonical corresponding function corresponding to outer join? Something like:

outerZipWith :: (a -> b -> c) -> a -> b -> [a] -> [b] -> [c]
outerZipWith _ _ _ [] [] = []
outerZipWith f a' b' [] (b:bs) = f a' b : outerZipWith f a' b' [] bs
outerZipWith f a' b' (a:as) [] = f a b' : outerZipWith f a' b' as []
outerZipWith f a' b' (a:as) (b:bs) = f a b : outerZipWith f a' b' as bs

or maybe

outerZipWith' :: (a -> b -> c) -> Maybe a -> Maybe b -> [a] -> [b] -> [c]
outerZipWith' _ _ _ [] [] = []
outerZipWith' _ Nothing _ [] _ = []
outerZipWith' _ _ Nothing _ [] = []
outerZipWith' f a' b' [] (b:bs) = f (fromJust a') b : outerZipWith f a' b' [] bs
outerZipWith' f a' b' (a:as) [] = f a (fromJust b') : outerZipWith f a' b' as []
outerZipWith' f a' b' (a:as) (b:bs) = f a b : outerZipWith f a' b' as bs

So I can do

outerZipWith (+) 0 0 [1..5] [10..20]  == [11,13,15,17,19,15,16,17,18,19,20]
outerZipWith (+) 0 0 [1..11] [10..14] == [11,13,15,17,19,6,7,8,9,10,11]

I find myself needing it from time to time, and I'd rather use a common idiom to make my code more writable (and easier to maintain) rather than implementing outerZipWith, or doing if length as < length bs then zipWith f (as ++ repeat a) bs else zipWith f as (bs ++ repeat b).

like image 781
rampion Avatar asked Feb 08 '12 17:02

rampion


3 Answers

This kind of thing comes up a lot. It's the cogroup operation of the PACT algebra. Where I work, we make use of type classes to differentiate these three operations:

  1. zip: Structural intersection.
  2. align: Structural union.
  3. liftA2: Structural cartesian product.

This is discussed by Paul Chiusano on his blog.

like image 102
Apocalisp Avatar answered Nov 02 '22 09:11

Apocalisp


This seems awkward because you're trying to skip to the end rather than deal with the primitive operations.

First of all, zipWith is conceptually a zip followed by map (uncurry ($)). This is an important point, because (un)currying is why zipWith is possible at all. Given lists with types [a] and [b], to apply a function (a -> b -> c) and get something of type [c] you obviously need one of each input. The two ways to do this are precisely the two Applicative instances for lists, and zipWith is liftA2 for one of them. (The other instance is the standard one, which gives the cartesian product--a cross join, if you prefer).

The semantics you want don't correspond to any obvious Applicative instance, which is why it's much more difficult. Consider first an outerZip :: [a] -> [b] -> [?? a b] and what type it would have. The elements of the result list could each be an a, b, or both. This not only doesn't correspond to any standard data type, it's awkward to express in terms of them because you can't factor anything useful out of the expression (A + B + A*B).

Such a data type has its own uses, which is why I have my own package defining one. I recall there being one on hackage as well (with fewer auxiliary functions than mine, I think) but I can't remember what it was called.

Sticking to just standard stuff, you end up needing a sensible "default value" which roughly translates into having a Monoid instance and using the identity value to pad the lists out. Translating to and from an appropriate Monoid using newtype wrappers or such may not end up being any simpler than your current implementation, however.


As an aside, your remark about list indices as keys can actually be developed further; any Functor with a similar notion of a key is isomorphic to the Reader monad, a.k.a. an explicit function from keys to values. Edward Kmett has, as always, a bunch of packages encoding these abstract notions, in this case building from the representable-functors package. Might be helpful, if you don't mind writing a dozen instances just to get started at least...

like image 43
C. A. McCann Avatar answered Nov 02 '22 10:11

C. A. McCann


Instead of filling out the shorter list with a constant, why not provide a list of values to take until the end of the longer list is reached? This also eliminates the need for a Maybe since the list can be empty (or of finite length).

I couldn't find anything standard, but short of a complete re-implementation of zipWith along the lines you showed, I developed your length test version like this:

outerZipWith :: (a -> b -> c) -> [a] -> [b] -> [a] -> [b] -> [c]
outerZipWith f as bs as' bs' = take n $ zipWith f (as ++ as') (bs ++ bs')
  where n = max (length as) (length bs)
like image 3
pat Avatar answered Nov 02 '22 09:11

pat