The standard libraries include a function
unzip :: [(a, b)] -> ([a], [b])
The obvious way to define this is
unzip xs = (map fst xs, map snd xs)
However, this means traversing the list twice to construct the result. What I'm wondering is, is there some way to do this with only one traversal?
Appending to a list is expensive - O(n) in fact. But, as any newbie knows, we can make clever use of laziness and recursion to "append" to a list with a recursive call. Thus, zip
may easily be implemented as
zip :: [a] -> [b] -> [(a, b)]
zip (a:as) (b:bs) = (a,b) : zip as bs
This trick appear to only work if you're returning one list, however. I can't see how to extend this to allow constructing the tails of multiple lists simultaneously without ending up duplicating the source traversal.
I always presumed that the unzip
from the standard library manages to do this in a single traversal (that's kind of the whole point of implementing this otherwise trivial function in a library), but I don't actually know how it works.
Yes, it is possible:
unzip = foldr (\(a,b) ~(as,bs) -> (a:as,b:bs)) ([],[])
With explicit recursion, this would look thus:
unzip [] = ([], [])
unzip ((a,b):xs) = (a:as, b:bs)
where ( as, bs) = unzip xs
The reason that the standard library has the irrefutable pattern match ~(as, bs)
is to allow it to work actually lazily:
Prelude> let unzip' = foldr (\(a,b) ~(as,bs) -> (a:as,b:bs)) ([],[])
Prelude> let unzip'' = foldr (\(a,b) (as,bs) -> (a:as,b:bs)) ([],[])
Prelude> head . fst $ unzip' [(n,n) | n<-[1..]]
1
Prelude> head . fst $ unzip'' [(n,n) | n<-[1..]]
*** Exception: stack overflow
The following ideas stem from The Beautiful Folding.
When you have two folding operations over a list, you can always perform them at once by folding with keeping both their states. Let's express this in Haskell. First we need to capture what is a folding operation:
{-# LANGUAGE ExistentialQuantification #-}
import Control.Applicative
data Foldr a b = forall r . Foldr (a -> r -> r) r (r -> b)
A folding operation has a folding function, a start value, and a function that produces a result from a final state. By using existential quantification we can hide the type of the state, which is necessary to combine folds with different states.
Applying a Foldr
to a list is just the matter of calling foldr
with the appropriate arguments:
fold :: Foldr a b -> [a] -> b
fold (Foldr f s g) = g . foldr f s
Naturally, Foldr
is a functor, we can always append a function to the finalizing one:
instance Functor (Foldr a) where
fmap f (Foldr k s r) = Foldr k s (f . r)
More interestingly, it's also an Applicative
functor. Implementing pure
is easy, we just return a given value and don't fold anything. The most interesting part is <*>
. It creates a new fold that keeps the states of both give folds and at the end, combines the results.
instance Applicative (Foldr a) where
pure x = Foldr (\_ _ -> ()) () (\_ -> x)
(Foldr f1 s1 r1) <*> (Foldr f2 s2 r2)
= Foldr foldPair (s1, s2) finishPair
where
foldPair a ~(x1, x2) = (f1 a x1, f2 a x2)
finishPair ~(x1, x2) = r1 x1 (r2 x2)
f *> g = g
f <* g = f
Notice (as in leftaroundabout's answer) that we have lazy pattern matches ~
on tuples. This ensures that <*>
is sufficiently lazy.
Now we can express map
as a Foldr
:
fromMap :: (a -> b) -> Foldr a [b]
fromMap f = Foldr (\x xs -> f x : xs) [] id
With that, defining unzip
becomes easy. We just combine two maps, one using fst
and another using snd
:
unzip' :: Foldr (a, b) ([a], [b])
unzip' = (,) <$> fromMap fst <*> fromMap snd
unzip :: [(a, b)] -> ([a], [b])
unzip = fold unzip'
We can verify that it processes an input only once (and lazily): Both
head . snd $ unzip (repeat (3,'a'))
head . fst $ unzip (repeat (3,'a'))
yield the correct result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With