Suppose you have a large number of types and a large number of functions that each return "subsets" of these types. Let's use a small example to make the situation more explicit. Here's a simple algebraic data type: <pre class="prettyprint"><code>data T = A | B | C </code></pre> and there are two functions <code>f</code>, <code>g</code> that return a <code>T</code> <pre class="prettyprint"><code>f :: T g :: T </code></pre> For the situation at hand, assume it is important that <code>f</code> can only return a <code>A</code> or <code>B</code> and <code>g</code> can only return a <code>B</code> or <code>C</code>. I would like to encode this in the type system. Here are a few reasons/circumstances why this might be desirable: <ul> <li>Let the functions <code>f</code> and <code>g</code> have a more informative signature than just <code>::T</code> </li> <li>Enforce that implementations of <code>f</code> and <code>g</code> do not accidentally return a forbidden type that users of the implementation then accidentally use</li> <li>Allow code reuse, e.g. when helper functions are involved that only operate on subsets of type <code>T</code> </li> <li>Avoid boilerplate code (see below)</li> <li>Make refactoring (much!) easier</li> </ul> One way to do this is to split up the algebraic datatype and wrap the individual types as needed: <pre class="prettyprint"><code>data A = A data B = B data C = C data Retf = RetfA A | RetfB B data Retg = RetgB B | RetgC C f :: Retf g :: Retg </code></pre> This works, and is easy to understand, but carries a lot of boilerplate for frequent unwrapping of the return types <code>Retf</code> and <code>Retg</code>. I don't see polymorphism being of any help, here. So, probably, this is a case for dependent types. It's not really a type-level list, rather a type-level set, but I've never seen a type-level set. The goal, in the end, is to encode the domain knowledge via the types, so that compile-time checks are available, without having excessive boilerplate. (The boilerplate gets really annoying when there are lots of types and lots of functions.)

Define an auxiliary sum type (to be used as a data kind) where each branch corresponds to a version of your main type: <pre class="prettyprint"><code>{-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE StandaloneKindSignatures #-} {-# LANGUAGE StandaloneDeriving #-} {-# LANGUAGE DataKinds #-} import Data.Kind import Data.Void import GHC.TypeLits data Version = AllEnabled | SomeDisabled </code></pre> Then define a type family that maps the version and the constructor name (given as a type-level <code>Symbol</code>) to the type <code>()</code> if that branch is allowed, and to the empty type <code>Void</code> if it's disallowed. <pre class="prettyprint"><code>type Enabled :: Version -> Symbol -> Type type family Enabled v ctor where Enabled SomeDisabled "C" = Void Enabled _ _ = () </code></pre> Then define your type as follows: <pre class="prettyprint"><code>type T :: Version -> Type data T v = A !(Enabled v "A") | B !(Enabled v "B") | C !(Enabled v "C") </code></pre> (The strictness annotations are there to help the exhaustivity checker.) Typeclass instances can be derived, but separately for each version: <pre class="prettyprint"><code>deriving instance Show (T AllEnabled) deriving instance Eq (T AllEnabled) deriving instance Show (T SomeDisabled) deriving instance Eq (T SomeDisabled) </code></pre> Here's an example of use: <pre class="prettyprint"><code>noC :: T SomeDisabled noC = A () main :: IO () main = print $ case noC of A _ -> "A" B _ -> "B" -- this doesn't give a warning with -Wincomplete-patterns </code></pre> This solution makes pattern-matching and construction more cumbersome, because those <code>()</code> are always there. A variation is to have one type family per branch (as in Trees that Grow) instead of a two-parameter type family.

Subset algebraic data type, or type-level set, in Haskell

Suppose you have a large number of types and a large number of functions that each return "subsets" of these types.

Let's use a small example to make the situation more explicit. Here's a simple algebraic data type:

data T = A | B | C

and there are two functions f, g that return a T

f :: T
g :: T

For the situation at hand, assume it is important that f can only return a A or B and g can only return a B or C.

I would like to encode this in the type system. Here are a few reasons/circumstances why this might be desirable:

Let the functions f and g have a more informative signature than just ::T
Enforce that implementations of f and g do not accidentally return a forbidden type that users of the implementation then accidentally use
Allow code reuse, e.g. when helper functions are involved that only operate on subsets of type T
Avoid boilerplate code (see below)
Make refactoring (much!) easier

One way to do this is to split up the algebraic datatype and wrap the individual types as needed:

data A = A
data B = B
data C = C

data Retf = RetfA A | RetfB B 
data Retg = RetgB B | RetgC C

f :: Retf
g :: Retg

This works, and is easy to understand, but carries a lot of boilerplate for frequent unwrapping of the return types Retf and Retg.

I don't see polymorphism being of any help, here.

So, probably, this is a case for dependent types. It's not really a type-level list, rather a type-level set, but I've never seen a type-level set.

The goal, in the end, is to encode the domain knowledge via the types, so that compile-time checks are available, without having excessive boilerplate. (The boilerplate gets really annoying when there are lots of types and lots of functions.)

What is algebraic data type in Haskell?

This is a type where we specify the shape of each of the elements. Wikipedia has a thorough discussion. "Algebraic" refers to the property that an Algebraic Data Type is created by "algebraic" operations. The "algebra" here is "sums" and "products": "sum" is alternation ( A | B , meaning A or B but not both)

What is the difference between data and type in Haskell?

Type and data type refer to exactly the same concept. The Haskell keywords type and data are different, though: data allows you to introduce a new algebraic data type, while type just makes a type synonym. See the Haskell wiki for details.

Does Haskell have different types?

Everything in Haskell has a type, so the compiler can reason quite a lot about your program before compiling it. Unlike Java or Pascal, Haskell has type inference.

What does () mean in Haskell?

() is very often used as the result of something that has no interesting result. For example, an IO action that is supposed to perform some I/O and terminate without producing a result will typically have type IO () .

Define an auxiliary sum type (to be used as a data kind) where each branch corresponds to a version of your main type:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneKindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DataKinds #-}
import Data.Kind
import Data.Void
import GHC.TypeLits

data Version = AllEnabled | SomeDisabled

Then define a type family that maps the version and the constructor name (given as a type-level Symbol) to the type () if that branch is allowed, and to the empty type Void if it's disallowed.

type Enabled :: Version -> Symbol -> Type
type family Enabled v ctor where
    Enabled SomeDisabled "C" = Void
    Enabled _ _ = ()

Then define your type as follows:

type T :: Version -> Type
data T v = A !(Enabled v "A")
         | B !(Enabled v "B")
         | C !(Enabled v "C")

(The strictness annotations are there to help the exhaustivity checker.)

Typeclass instances can be derived, but separately for each version:

deriving instance Show (T AllEnabled)
deriving instance Eq (T AllEnabled)
deriving instance Show (T SomeDisabled)
deriving instance Eq (T SomeDisabled)

Here's an example of use:

noC :: T SomeDisabled
noC = A ()

main :: IO ()
main = print $ case noC of
    A _ -> "A"
    B _ -> "B"
    -- this doesn't give a warning with -Wincomplete-patterns

This solution makes pattern-matching and construction more cumbersome, because those () are always there.

A variation is to have one type family per branch (as in Trees that Grow) instead of a two-parameter type family.

Subset algebraic data type, or type-level set, in Haskell

Tags:

haskell

algebraic-data-types

dependent-type

mcmayer

People also ask

1 Answers

danidiaz

Recent Activity

Donate For Us

Subset algebraic data type, or type-level set, in Haskell

Tags:

haskell

algebraic-data-types

dependent-type

mcmayer

People also ask

1 Answers

danidiaz

Related questions

Recent Activity

Donate For Us