Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to expose constructors of a data type when designing data structures?

When designing data structures in functional languages there are 2 options:

  1. Expose their constructors and pattern match on them.
  2. Hide their constructors and use higher-level functions to examine the data structures.

In what cases, what is appropriate?

Pattern matching can make code much more readable or simpler. On the other hand, if we need to change something in the definition of a data type then all places where we pattern-match on them (or construct them) need to be updated.


I've been asking this question myself for some time. Often it happens to me that I start with a simple data structure (or even a type alias) and it seems that constructors + pattern matching will be the easiest approach and produce a clean and readable code. But later things get more complicated, I have to change the data type definition and refactor a big part of the code.

like image 283
Petr Avatar asked Sep 06 '12 14:09

Petr


People also ask

What is a type constructor in Haskell?

In a data declaration, a type constructor is the thing on the left hand side of the equals sign. The data constructor(s) are the things on the right hand side of the equals sign. You use type constructors where a type is expected, and you use data constructors where a value is expected.

How do you define data types in Haskell?

The Data Keyword and Constructors In general, we define a new data type by using the data keyword, followed by the name of the type we're defining. The type has to begin with a capital letter to distinguish it from normal expression names. To start defining our type, we must provide a constructor.

How do types work in Haskell?

In Haskell, every statement is considered as a mathematical expression and the category of this expression is called as a Type. You can say that "Type" is the data type of the expression used at compile time.


1 Answers

The essential factor for me is the answer to the following question:

Is the structure of my datatype relevant to the outside world?

For example, the internal structure of the list datatype is very much relevant to the outside world - it has an inductive structure that is certainly very useful to expose to consumers, because they construct functions that proceed by induction on the structure of the list. If the list is finite, then these functions are guaranteed to terminate. Also, defining functions in this way makes it easy to provide properties about them, again by induction.

By contrast, it is best for the Set datatype to be kept abstract. Internally, it is implemented as a tree in the containers package. However, it might as well have been implemented using arrays, or (more usefully in a functional setting) with a tree with a slightly different structure and respecting different invariants (balanced or unbalanced, branching factor, etc). The need to enforce any invariants above and over those that the constructors already enforce through their types, by the way, precludes letting the datatype be concrete.

The essential difference between the list example and the set example is that the Set datatype is only relevant for the operations that are possible on Set's. Whereas lists are relevant because the standard library already provides many functions to act on them, but in addition their structure is relevant.

As a sidenote, one might object that actually the inductive structure of lists, which is so fundamental to write functions whose termination and behaviour is easy to reason about, is captured abstractly by two functions that consume lists: foldr and foldl. Given these two basic list operators, most functions do not need to inspect the structure of a list at all, and so it could be argued that lists too coud be kept abstract. This argument generalizes to many other similar structures, such as all Traversable structures, all Foldable structures, etc. However, it is nigh impossible to capture all possible recursion patterns on lists, and in fact many functions aren't recursive at all. Given only foldr and foldl, one would, writing head for example would still be possible, though quite tedious:

head xs = fromJust $ foldl (\b x -> maybe (Just x) Just b) Nothing xs

We're much better off just giving away the internal structure of the list.

One final point is that sometimes the actual representation of a datatype isn't relevant to the outside world, because say it is some kind of optimised and might not be the canonical representation, or there isn't a single "canonical" representation. In these cases, you'll want to keep your datatype abstract, but offer "views" of your datatype, which do provide concrete representations that can be pattern matched on.

One example would be if wanted to define a Complex datatype of complex numbers, where both cartesian forms and polar forms can be considered canonical. In this case, you would keep Complex abstract, but export two views, ie functions polar and cartesian that return a pair of a length and an angle or a coordinate in the cartesian plane, respectively.

like image 181
macron Avatar answered Sep 18 '22 13:09

macron