Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell - Transform a list of unions into a tuple of lists

I'm looking for a way to transform a list into an n-tuple with one list for each of the n constructors in a disjoint union. The standard library defines a similar function specifically for Eithers:

partitionEithers :: [Either a b] -> ([a], [b])

I'm looking for techniques for solving the generalized problem with the following requirements:

  • convenient to write
  • as little boilerplate as possible
  • processes the list in a single pass
  • datatype-generics, metaprogramming, existing libraries etc are all permitted
Example

Here is an example specification with two proposed solutions:

partitionSum :: [MySum] -> ([A], [B], [C], [D])

data MySum
  = CaseA A
  | CaseB B
  | CaseC C
  | CaseD D

data A = A deriving Show
data B = B deriving Show
data C = C deriving Show
data D = D deriving Show

-- expect "([A,A],[B,B,B],[],[D])"
test :: IO ()
test = print . partitionSum $
  [CaseD D, CaseB B, CaseA A, CaseA A, CaseB B, CaseB B]

First attempt: n list comprehensions that traverse the list n times.

partitionSum1 :: [MySum] -> ([A], [B], [C], [D])
partitionSum1 xs =
  ( [a | CaseA a <- xs]
  , [b | CaseB b <- xs]
  , [c | CaseC c <- xs]
  , [d | CaseD d <- xs]
  )

Second attempt: a single traversal of the input list. I have to manually thread the state through the fold which makes the solution a little repetitive and annoying to write.

partitionSum2 :: [MySum] -> ([A], [B], [C], [D])
partitionSum2 = foldr f ([], [], [], [])
  where
    f x (as, bs, cs, ds) =
      case x of
        CaseA a -> (a : as, bs, cs, ds)
        CaseB b -> (as, b : bs, cs, ds)
        CaseC c -> (as, bs, c : cs, ds)
        CaseD d -> (as, bs, cs, d : ds)
like image 487
dumbo Avatar asked Aug 27 '21 19:08

dumbo


People also ask

Are lists homogeneous or heterogeneous in Haskell?

This one lists all of the vowels in the sentence, in the order they're seen within it: In Haskell, lists are homogeneous -- they can only store one kind of value ( Num, Bool, Char, etc.). If you want to store heterogeneous values, you need to use a tuple (created using parentheses):

What is a list comprehension in Haskell?

For instance, check out this paper written by three of the designers of Haskell in 1999.) List comprehensions have an output function, one or more input sets, and one or more predicates, in that order. A basic list comprehension looks like: ghci> [ <output function> | <input set>, ..., <predicate>, ... ]

What are Haskell classes?

Haskell classes (also called typeclasses) are sort of like Java interfaces in that any child class derived from a particular parent class is guaranteed to implement some specific behaviour. Classes which implement Eq can be tested for equality.

What are the different types of variables in Haskell?

...and so on. Double, Float, Int, Integer, and other predefined types also exist in Haskell, but -- as type inference gives the variable the widest possible scope (usually Num or Fractional for numbers) -- you have to explicitly declare a variable as one of these narrower types: What's the difference between Int and Integer, though?


2 Answers

In addition to the Representable answer:


A thing that came to me from seeing foldr f ([], [], [], []) was to define a monoid where the nil case is mempty

{-# DerivingVia #-}
..
import GHC.Generics (Generically(..), ..)

type Classify :: Type
type Classify = C [A] [B] [C] [D]
  deriving
  stock Generic

  deriving (Semigroup, Monoid)
  via Generically Classify

-- mempty = C [] [] [] []
-- C as bs cs ds <> C as1 bs1 cd1 ds1 = C (as ++ as1) (bs ++ bs1) (cs ++ cs1) (ds ++ ds1)

Generically will be exported from GHC.Generics in the future. It defines Classify as a semigroup and monoid through generic pointwise lifting.

With this all you need is a classifier function, that classifies a MySum into Classify and you can define partition in terms of foldMap

classify :: MySum -> Classify
classify = \case
  SumA a -> C [a] [] [] []
  SumB b -> C [] [b] [] []
  SumC c -> C [] [] [c] []
  SumD d -> C [] [] [] [d]

partition :: Foldable f => f MySum -> Classify
partition = foldMap classify
like image 98
Iceland_jack Avatar answered Sep 21 '22 09:09

Iceland_jack


As your function is a transformation from sums to products, there's a fairly simple implementation using generics-sop. This is a library which enhances GHCs generics with more specialized types that make induction on algebriac type (i.e. sums of products) simpler.

First, a prelude:

{-# LANGUAGE DeriveGeneric, StandaloneDeriving #-}

import Generics.SOP hiding ((:.:))
import qualified GHC.Generics as GHC
import GHC.Generics ((:.:)(..))


partitionSum :: (Generic t) => [t] -> NP ([] :.: NP I) (Code t)

This is the method you want to write. Let's examine its type.

  • the single argument is a list of some generic type. Pretty straightforward. Note here that Generic is the one from generics-sop, not from GHC
  • the returned value is an n-ary product (n-tuple) where each element is a list composed with NP I (itself an n-ary product, because generally, algebraic datatype constructors might have more than one field)
  • Code t is the sum-of-products type representation of t. It's a list of lists of type. e.g. Code (Either a b) ~ '[ '[a], '[b] ]. The generic value representation of t is SOP I (Code t) - a sum of of products over the "code".

To implement this, we can convert each t to its generic representation, then fold over the resulting list:


partitionSum = partitionSumGeneric . map from

partitionSumGeneric :: SListI xss => [SOP I xss] -> NP ([] :.: NP I) xss
partitionSumGeneric = foldr (\(SOP x) -> classifyGeneric x) emptyClassifier

partitionSumGeneric is pretty much the same as partitionSum, but operates on generic representations of values.

Now for the interesting part. Let's begin with the base case of our fold. This should contain empty lists in every position. generics-sop provides a handy mechanism for generating a product type with a uniform value in each position:

emptyClassifier :: SListI xs => NP ([] :.: NP I) xs
emptyClassifier = hpure (Comp1 [])

The recursive case is as follows: if the value has tag at index k, add that value to the list at index k in the accumulator. We can do this with simultaneous recursion on both the sum type (it's generic now, so a value of type NS (NP I) xs - a sum of products) and on the accumulator.

classifyGeneric :: NS (NP I) xss -> NP ([] :.: NP I) xss -> NP ([] :.: NP I) xss
classifyGeneric (Z x)  (Comp1 l :* ls) = (Comp1 $ x : l) :* ls
classifyGeneric (S xs) (      l :* ls) =              l  :* classifyGeneric xs ls

Your example with some added data to make it a bit more interesting:

data MySum
  = CaseA A
  | CaseB B
  | CaseC C
  | CaseD D

-- All that's needed for `partitionSum' to work with your type
deriving instance GHC.Generic MySum
instance Generic MySum

data A = A Int deriving Show
data B = B String Int deriving Show
data C = C deriving Show
data D = D Integer deriving Show

test = partitionSum $
  [CaseD $ D 0, CaseB $ B "x" 1, CaseA $ A 2, CaseA $ A 3, CaseB $ B "y" 4, CaseB $ B "z" 5]

the result is:

Comp1 {unComp1 = [I (A 2) :* Nil,I (A 3) :* Nil]} :* Comp1 {unComp1 = [I (B "x" 1) :* Nil,I (B "y" 4) :* Nil,I (B "z" 5) :* Nil]} :* Comp1 {unComp1 = []} :* Comp1 {unComp1 = [I (D 0) :* Nil]} :*Nil
like image 38
user2407038 Avatar answered Sep 19 '22 09:09

user2407038