Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unzip in one pass?

Tags:

list

haskell

The standard libraries include a function

unzip :: [(a, b)] -> ([a], [b])

The obvious way to define this is

unzip xs = (map fst xs, map snd xs)

However, this means traversing the list twice to construct the result. What I'm wondering is, is there some way to do this with only one traversal?

Appending to a list is expensive - O(n) in fact. But, as any newbie knows, we can make clever use of laziness and recursion to "append" to a list with a recursive call. Thus, zip may easily be implemented as

zip :: [a] -> [b] -> [(a, b)]
zip (a:as) (b:bs) = (a,b) : zip as bs

This trick appear to only work if you're returning one list, however. I can't see how to extend this to allow constructing the tails of multiple lists simultaneously without ending up duplicating the source traversal.

I always presumed that the unzip from the standard library manages to do this in a single traversal (that's kind of the whole point of implementing this otherwise trivial function in a library), but I don't actually know how it works.

like image 797
MathematicalOrchid Avatar asked Nov 29 '22 15:11

MathematicalOrchid


2 Answers

Yes, it is possible:

unzip    =  foldr (\(a,b) ~(as,bs) -> (a:as,b:bs)) ([],[])

With explicit recursion, this would look thus:

unzip [] = ([], [])
unzip ((a,b):xs) = (a:as, b:bs)
             where (  as,   bs) = unzip xs

The reason that the standard library has the irrefutable pattern match ~(as, bs) is to allow it to work actually lazily:

Prelude> let unzip' = foldr (\(a,b) ~(as,bs) -> (a:as,b:bs)) ([],[])
Prelude> let unzip'' = foldr (\(a,b) (as,bs) -> (a:as,b:bs)) ([],[])
Prelude> head . fst $ unzip' [(n,n) | n<-[1..]]
1
Prelude> head . fst $ unzip'' [(n,n) | n<-[1..]]
*** Exception: stack overflow

like image 91
leftaroundabout Avatar answered Dec 10 '22 11:12

leftaroundabout


The following ideas stem from The Beautiful Folding.

When you have two folding operations over a list, you can always perform them at once by folding with keeping both their states. Let's express this in Haskell. First we need to capture what is a folding operation:

{-# LANGUAGE ExistentialQuantification #-}
import Control.Applicative

data Foldr a b = forall r . Foldr (a -> r -> r) r (r -> b)

A folding operation has a folding function, a start value, and a function that produces a result from a final state. By using existential quantification we can hide the type of the state, which is necessary to combine folds with different states.

Applying a Foldr to a list is just the matter of calling foldr with the appropriate arguments:

fold :: Foldr a b -> [a] -> b
fold (Foldr f s g) = g . foldr f s

Naturally, Foldr is a functor, we can always append a function to the finalizing one:

instance Functor (Foldr a) where
    fmap f (Foldr k s r) = Foldr k s (f . r)

More interestingly, it's also an Applicative functor. Implementing pure is easy, we just return a given value and don't fold anything. The most interesting part is <*>. It creates a new fold that keeps the states of both give folds and at the end, combines the results.

instance Applicative (Foldr a) where
    pure x  = Foldr (\_ _ -> ()) () (\_ -> x)
    (Foldr f1 s1 r1) <*> (Foldr f2 s2 r2)
            = Foldr foldPair (s1, s2) finishPair
      where
        foldPair a ~(x1, x2) = (f1 a x1, f2 a x2)
        finishPair ~(x1, x2) = r1 x1 (r2 x2)
    f *> g  = g
    f <* g  = f

Notice (as in leftaroundabout's answer) that we have lazy pattern matches ~ on tuples. This ensures that <*> is sufficiently lazy.

Now we can express map as a Foldr:

fromMap :: (a -> b) -> Foldr a [b]
fromMap f = Foldr (\x xs -> f x : xs) [] id

With that, defining unzip becomes easy. We just combine two maps, one using fst and another using snd:

unzip' :: Foldr (a, b) ([a], [b])
unzip' = (,) <$> fromMap fst <*> fromMap snd

unzip :: [(a, b)] -> ([a], [b])
unzip = fold unzip'

We can verify that it processes an input only once (and lazily): Both

head . snd $ unzip (repeat (3,'a'))
head . fst $ unzip (repeat (3,'a'))

yield the correct result.

like image 36
Petr Avatar answered Dec 10 '22 11:12

Petr