What do Haskell (data) constructors construct?

Question

Haskell enables one to construct algebraic data types using type constructors and data constructors. For example,

data Circle = Circle Float Float Float

and we are told this data constructor (Circle on right) is a function that constructs a circle when give data, e.g. x, y, radius.

Circle :: Float -> Float -> Float -> Circle

My questions are:

What is actually constructed by this function, specifically?
Can we define the constructor function?

I've seen Smart Constructors but they just seem to be extra functions that eventually call the regular constructors.

Coming from an OO background, constructors, of course, have imperative specifications. In Haskell, they seem to be system-defined.

chepner · Accepted Answer

In Haskell, without considering the underlying implementation, a data constructor creates a value, essentially by fiat. “ ‘Let there be a Circle’, said the programmer, and there was a Circle.” Asking what Circle 1 2 3 creates is akin to asking what the literal 1 creates in Python or Java.

A nullary constructor is closer to what you usually think of as a literal. The Boolean type is literally defined as

data Boolean = True | False

where True and False are data constructors, not literals defined by Haskell grammar.

The data type is also the definition of the constructor; as there isn't really anything to a value beyond the constructor name and its arguments, simply stating it is the definition. You create a value of type Circle by calling the data constructor Circle with 3 arguments, and that's it.

A so-called "smart constructor" is just a function that calls a data constructor, with perhaps some other logic to restrict which instances can be created. For example, consider a simple wrapper around Integer:

newtype PosInteger = PosInt Integer

The constructor is PosInt; a smart constructor might look like

mkPosInt :: Integer -> PosInteger
mkPosInt n | n > 0 = PosInt n
           | otherwise = error "Argument must be positive"

With mkPosInt, there is no way to create a PosInteger value with a non-positive argument, because only positive arguments actually call the data constructor. A smart constructor makes the most sense when it, and not the data constructor, is exported by a module, so that a typical user cannot create arbitrary instances (because the data constructor does not exist outside the module).

K. A. Buhr · Answer

Good question. As you know, given the definition:

data Foo = A | B Int

this defines a type with a (nullary) type constructor Foo and two data constructors, A and B.

Each of these data constructors, when fully applied (to no arguments in the case of A and to a single Int argument in the case of B) constructs a value of type Foo. So, when I write:

a :: Foo
a = A

b :: Foo
b = B 10

the names a and b are bound to two values of type Foo.

So, data constructors for type Foo construct values of type Foo.

What are values of type Foo? Well, first of all, they are different from values of any other type. Second, they are wholly defined by their data constructors. There is a distinct value of type Foo, different from all other values of Foo, for each combination of a data constructor with a set of distinct arguments passed to that data constructor. That is, two values of type Foo are identical if and only if they were constructed with the same data constructor given identical sets of arguments. ("Identical" here means something different from "equality", which may not necessarily be defined for a given type Foo, but let's not get into that.)

That's also what makes data constructors different from functions in Haskell. If I have a function:

bar :: Int -> Bool

It's possible that bar 1 and bar 2 might be exactly the same value. For example, if bar is defined by:

bar n = n > 0

then it's obvious that bar 1 and bar 2 (and bar 3) are identically True. Whether the value of bar is the same for different values of its arguments will depend on the function definition.

In contrast, if Bar is a constructor:

data BarType = Bar Int

then it's never going to be the case that Bar 1 and Bar 2 are the same value. By definition, they will be different values (of type BarType).

By the way, the idea that constructors are just a special kind of function is a common viewpoint. I personally think this is inaccurate and causes confusion. While it's true that constructors can often be used as if they are functions (specifically that they behave very much like functions when used in expressions), I don't think this view stands up under much scrutiny -- constructors are represented differently in the surface syntax of the language (with capitalized identifiers), can be used in contexts (like pattern matching) where functions cannot be used, are represented differently in compiled code, etc.

So, when you ask "can we define the constructor function", the answer is "no", because there is no constructor function. Instead, a constructor like A or B or Bar or Circle is what it is -- something different from a function (that sometimes behaves like a function with some special additional properties) which is capable of constructing a value of whatever type the data constructor belongs to.

This makes Haskell constructors very different from OO constructors, but that's not surprising since Haskell values are very different from OO objects. In an OO language, you can typically provide a constructor function that does some processing in building the object, so in Python you might write:

class Bar:
    def __init__(self, n):
        self.value = n > 0

and then after:

bar1 = Bar(1)
bar2 = Bar(2)

we have two distinct objects bar1 and bar2 (which would satify bar1 != bar2) that have been configured with the same field values and are in some sense "equal". This is sort of halfway between the situation above with bar 1 and bar 2 creating two identical values (namely True) and the situation with Bar 1 and Bar 2 creating two distinct values that, by definition, can't possibly be the "same" in any sense.

You can never have this situation with Haskell constructors. Instead of thinking of a Haskell constructor as running some underlying function to "construct" an object which might involve some cool processing and deriving of field values, you should instead think of a Haskell constructor as a passive tag attached to a value (which may also contain zero or more other values, depending on the arity of the constructor).

So, in your example, Circle 10 20 5 doesn't "construct" an object of type Circle by running some function. It directly creates a tagged object that, in memory, will look something like:

<Circle tag>
<Float value 10>
<Float value 20>
<Float value 5>

(or you can at least pretend that's what it looks like in memory).

The closest you can come to OO constructors in Haskell is using smart constructors. As you note, eventually a smart constructor just calls a regular constructor, because that's the only way to create a value of a given type. No matter what kind of bizarre smart constructor you build to create a Circle, the value it constructs will need to look like:

<Circle tag>
<some Float value>
<another Float value>
<a final Float value>

which you'll need to construct with a plain old Circle constructor call. There's nothing else the smart constructor could return that would still be a Circle. That's just how Haskell works.

Does that help?

Jon Purdy · Answer

I’m going to answer this in a somewhat roundabout way, with an example that I hope illustrates my point, which is that Haskell decouples several distinct ideas that are coupled in OOP under the concept of a “class”. Understanding this will help you translate your experience from OOP into Haskell with less difficulty. The example in OOP pseudocode:

class Person {

    private int id;
    private String name;

    public Person(int id, String name) {
        if (id == 0)
            throw new InvalidIdException();
        if (name == "")
            throw new InvalidNameException();

        this.name = name;
        this.id = id;
    }

    public int getId() { return this.id; }

    public String getName() { return this.name; }

    public void setName(String name) { this.name = name; }

}

In Haskell:

module Person
  ( Person
  , mkPerson
  , getId
  , getName
  , setName
  ) where

data Person = Person
  { personId :: Int
  , personName :: String
  }

mkPerson :: Int -> String -> Either String Person
mkPerson id name
  | id == 0 = Left "invalid id"
  | name == "" = Left "invalid name"
  | otherwise = Right (Person id name)

getId :: Person -> Int
getId = personId

getName :: Person -> String
getName = personName

setName :: String -> Person -> Either String Person
setName name person = mkPerson (personId person) name

Notice:

The Person class has been translated to a module which happens to export a data type by the same name—types (for domain representation and invariants) are decoupled from modules (for namespacing and code organisation).
The fields id and name, which are specified as private in the class definition, are translated to ordinary (public) fields on the data definition, since in Haskell they’re made private by omitting them from the export list of the Person module—definitions and visibility are decoupled.
The constructor has been translated into two parts: one (the Person data constructor) that simply initialises the fields, and another (mkPerson) that performs validation—allocation & initialisation and validation are decoupled. Since the Person type is exported, but its constructor is not, this is the only way for clients to construct a Person—it’s an “abstract data type”.
The public interface has been translated to functions that are exported by the Person module, and the setName function that previously mutated the Person object has become a function that returns a new instance of the Person data type that happens to share the old ID. The OOP code has a bug: it should include a check in setName for the name != "" invariant; the Haskell code can avoid this by using the mkPerson smart constructor to ensure that all Person values are valid by construction. So state transitions and validation are also decoupled—you only need to check invariants when constructing a value, because it can’t change thereafter.

So as for your actual questions:

What is actually constructed by this function, specifically?

A constructor of a data type allocates space for the tag and fields of a value, sets the tag to which constructor was used to create the value, and initialises the fields to the arguments of the constructor. You can’t override it because the process is completely mechanical and there’s no reason (in normal safe code) to do so. It’s an internal detail of the language and runtime.

Can we define the constructor function?

No—if you want to perform additional validation to enforce invariants, you should use a “smart constructor” function which calls the lower-level data constructor. Because Haskell values are immutable by default, values can be made correct by construction; that is, when you don’t have mutation, you don’t need to enforce that all state transitions are correct, only that all states themselves are constructed correctly. And often you can arrange your types so that smart constructors aren’t even necessary.

The only thing you can change about the generated data constructor “function” is making its type signature more restrictive using GADTs, to help enforce more invariants at compile-time. And as a side note, GADTs also let you do existential quantification, which lets you carry around encapsulated/type-erased information at runtime, exactly like an OOP vtable—so this is another thing that’s decoupled in Haskell but coupled in typical OOP languages.

Long story short (too late), you can do all the same things, you just arrange them differently, because Haskell provides the various features of OOP classes under separate orthogonal language features.

What do Haskell (data) constructors construct?

Tags:

functional-programming

haskell

algebraic-data-types

Ashley Aitken

3 Answers

chepner

K. A. Buhr

Jon Purdy

Recent Activity

Donate For Us

What do Haskell (data) constructors construct?

Tags:

functional-programming

haskell

algebraic-data-types

Ashley Aitken

3 Answers

chepner

K. A. Buhr

Jon Purdy

Related questions

Recent Activity

Donate For Us