Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are primitives not needed for haskell data constructors?

Tags:

haskell

I'm new to haskell prpogramming and learning about the type system and am having trouble grasping what underlies a nullary data constructor..

Take for example the following:

data Color = Red | Green | Blue | Indigo | Violet deriving Show

genColor:: Color
genColor = Red 

From my understanding, Red, Green, Blue.. are nullary data constructors that when used construct a "Color".

What I'm having trouble grasping is, in traditional OOP languages you'd have to specify the primitive underlying the type -- for ex. whether a color is a string, an int, float etc.

In Haskell, the code above runs perfectly fine so why is this not needed? What is the rationale behind structuring the type system like this? Thanks and all help would be appreciated :)

like image 869
Andrew Tang Avatar asked Jun 22 '21 17:06

Andrew Tang


3 Answers

I am reminded of an excellent1 blog post by Alexis King: Types as Axioms. It contrasts the subtractive perspective you're assuming ("I define an enum type to restrict the set of integer values a variable may take on, and give those integers names for clarity") with the additive perspective that algebraic data types are really about ("Since I'm defining a brand-new type, I must also define brand-new values to inhabit it, which have no particular relation to values of any other types").


1 Redundant, since all her blog posts are excellent

like image 75
amalloy Avatar answered Sep 22 '22 15:09

amalloy


Haskell is just saving you from the need to write boilerplate like

# Python
class Color(Enum):
    Red = "Red"
    Green = "Green"
    Blue = "Blue"
    Indigo = "Indigo"
    Violet = "Violet"

Instead, Red et al are the values; they are effectively user-defined literals for the type Color. (Any other association with primitive machine-type values does not exist in the abstraction provided by Haskell itself.)

Any other "underlying" value is not a property of the type itself, but rather a projection supplied by a typeclass instance, such as one for Enum:

-- e.g. fromEnum Red == 0
-- toEnum is necessarily partial, since
-- Color has a strictly smaller cardinality than Int.
-- fromEnum is not surjective for the same reason.
instance Enum Color where
    fromEnum Red = 0
    fromEnum Green = 1
    fromEnum Blue = 2
    fromEnum Indigo = 3
    fromEnum Violet = 4
    toEnum 0 = Red
    toEnum 1 = Green
    toEnum 2 = Blue
    toEnum 3 = Indigo
    toEnum 4 = Violet

or Show:

-- e.g. show Red == "Red"
instance Show Color where
    show Red = "Red"
    show Green = "Green"
    show Blue = "Blue"
    show Indigo = "Indigo"
    show Violet = "Violet"

Such "obvious" instances can be derived by the compiler automatically, of course, but can be omitted entirely when you truly do not care if a Color value has an associated String or Int value.

like image 20
chepner Avatar answered Sep 21 '22 15:09

chepner


Many languages have primitive composite types, like arrays, or structures with named fields, or tuples, or what have you.

The primary primitive composite type in Haskell is the algebraic data type or ADT. It is a "sum of products" or a "tagged union of structures". OOP languages usually have an obvious primitive product (structure) type. In some languages, like C++, there literally is a struct primitive, but in most OOP languages, classes themselves serve as the obvious product type -- you have a bunch of fields defined by the class, and an object is a value that's a product of values for each field.

OOP languages seldom have an obvious primitive sum (tagged union) type. However, there's an implicit one. Every time you call a method on an object and rely on dispatch based on the object's class, you're essentially using a sum (tagged union) type. You have a variable whose type is an interface or an abstract superclass (depending on the language), and the variable's value is an object whose class implements the interface or inherits from the superclass. When you call a method on the variable, the class is consulted to dispatch to the right method. Equivalently, you have a variable whose type is a sum (tagged union); the variable's value corresponds to one component of the sum (member of the tagged union); and when you call a method on the variable, the tag is consulted to determine which component of the sum (member of the tagged union) you actually have, so the correct method implementation can be called.

This analogy isn't as crazy as it appears. Picture in your mind an OOP implementation of a binary tree with integer values in the leaves. You are probably imagining Node and Leaf classes implementing a common Tree interface that provides, say, traversal methods. Now, consider a Haskell implementation:

data Tree = Leaf Int | Node Tree Tree
            ^^^^-------^^^^- the classes
     ^^^^- the interface


traverse :: Tree -> [Int]
^^^^^^^^- the method
traverse (Leaf x) = [x]
traverse (Node l r) = traverse l ++ traverse r

In practice, many OOP designs are naturally mapped to Haskell this way, with interfaces, classes, and methods mapping to data types, constructors, and plain old functions.

What does this have to do with your question? Well, if you have a data type like:

data Color = Red | Green | Blue | Indigo | Violet

you shouldn't be thinking of Color as an alias for some other primitive type (e.g., integers) with some mapping of its values Red through Violet to values 1 through 5. This would be like considering the OOP interface Tree as being an alias for some primitive type (what, doubles? strings?) with aliased "values" Node and Leaf corresponding to primitive values like 2.0 or "hello".

Instead, think of Color as the primitive composite type it is -- a sum of products or tagged union of structures, where the products/structures happen to be nullary (with zero fields). As a result, all that's left are the tags for the tagged sum (like Red and so on). There's no "other" primitive type needed for the representation, because the composite sum-of-products primitive type is the underlying primitive type.

like image 41
K. A. Buhr Avatar answered Sep 18 '22 15:09

K. A. Buhr