Imagine a language which doesn't allow multiple value constructors for a data type. Instead of writing
data Color = White | Black | Blue
we would have
data White = White
data Black = Black
data Blue = Black
type Color = White :|: Black :|: Blue
where :|:
(here it's not |
to avoid confusion with sum types) is a built-in type union operator. Pattern matching would work in the same way
show :: Color -> String
show White = "white"
show Black = "black"
show Blue = "blue"
As you can see, in contrast to coproducts it results in a flat structure so you don't have to deal with injections. And, unlike sum types, it allows to randomly combine types resulting in greater flexibility and granularity:
type ColorsStartingWithB = Black :|: Blue
I believe it wouldn't be a problem to construct recursive data types as well
data Nil = Nil
data Cons a = Cons a (List a)
type List a = Cons a :|: Nil
I know union types are present in TypeScript and probably other languages, but why did the Haskell committee chose ADTs over them?
Why is it called a sum type? You could think of it as being an 'or' type, where the number of states it could be in is defined by summing them all together. Think of a boolean, it's a classic sum type. A boolean's possible set of values is limited to true or false.
SUM returns data type INTEGER for an expression with data type INT, SMALLINT, or TINYINT. SUM returns data type BIGINT for an expression with data type BIGINT. SUM returns data type DOUBLE for an expression with data type DOUBLE. For all other numeric data types, SUM returns data type NUMERIC.
A really popular and useful example of a Sum type in the functional world is the Maybe type. In Haskell, Maybe is a monad that wraps a value and helps you make sure that invalid values are not acted upon, thus allowing you to write safer functions.
The Sealed classes and Record classes provide sum and product types for Java. These algebraic data types are great and they provide a cleaner way to pattern match. The pattern matching provides concise, readable code, and exhaustive type checks by the compiler.
Haskell's sum type is very similar to your :|:
.
The difference between the two is that the Haskell sum type |
is a tagged union, while your "sum type" :|:
is untagged.
Tagged means every instance is unique - you can distunguish Int | Int
from Int
(actually, this holds for any a
):
data EitherIntInt = Left Int | Right Int
In this case: Either Int Int
carries more information than Int
because there can be a Left
and Right
Int
.
In your :|:
, you cannot distinguish those two:
type EitherIntInt = Int :|: Int
How do you know if it was a left or right Int
?
See the comments for an extended discussion of the section below.
Tagged unions have another advantage: The compiler can verify whether you as the programmer handled all cases, which is implementation-dependent for general untagged unions. Did you handle all cases in Int :|: Int
? Either this is isomorphic to Int
by definition or the compiler has to decide which Int
(left or right) to choose, which is impossible if they are indistinguishable.
Consider another example:
type (Integral a, Num b) => IntegralOrNum a b = a :|: b -- untagged
data (Integral a, Num b) => IntegralOrNum a b = Either a b -- tagged
What is 5 :: IntegralOrNum Int Double
in the untagged union? It is both an instance of Integral
and Num
, so we can't decide for sure and have to rely on implementation details. On the other hand, the tagged union knows exactly what 5
should be because it is branded with either Left
or Right
.
As for naming: The disjoint union in Haskell is a union type. ADTs are only a means of implementing these.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With