Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are row types? Are they algebraic data types?

I often hear that F# lacks support for OCaml row types, that makes the language more powerful than F#.

What are they? Are they algebraic data types, such as sum types (discriminated unions) or product types (tuples, records)? And is it possible to write row types in other dialects, such as F#?

like image 743
MiP Avatar asked Jan 04 '18 09:01

MiP


People also ask

What is a row type?

A row data type is a user-defined type containing an ordered sequence of named fields each with an associated data type.

What is an algebraic type system?

Two common classes of algebraic types are product types (i.e., tuples and records) and sum types (i.e., tagged or disjoint unions, coproduct types or variant types). The values of a product type typically contain several values, called fields. All values of that type have the same combination of field types.

Why are algebraic data types useful?

Just as algebra is fundamental to the whole of mathematics, algebraic data types (ADTs) are fundamental to many common functional programming languages. They're the primitives upon which all of our richer data structures are built, including everything from sets, maps, and queues, to bloom filters and neural networks.


3 Answers

First of all, let's ensure that we use terminology that is consistent with the OCaml type system and corresponding white papers. There is no such thing as "row type" in the type system of OCaml, however, it has "row polymorphism" and we will discuss it below0.

Row polymorphism is a form of polymorphism. OCaml provides two kinds of polymorphism - parametric and row, and lacks the other two - ad hoc and inclusion (aka subtyping)1.

First of all, what is polymoprhism? In the context of type systems, polymorphism allows a single term to have several types. The problem here is that the word type itself is heavily overloaded in the computer science and programming language community. So to minimize the confusion, let's just reintroduce it here, to be on the same page2. A type of a term usually denotes some approximation of the term semantics. Where semantics could be as simple as a set of values equipped with a set of operations or something more complex, like effects, annotations, and arbitrary theories. In general, semantics denotes a set of all possible behaviors of a term. A type system denotes a set of rules, that allows some language constructs and disallows others based on their types. I.e., it verifies that compositions of terms behave correctly. For example, if there is a function application construct in a language the type system will allow an application only to those arguments that have types that match the types of parameters. And that's where polymorphism comes into play. In monomorphic type systems, this match could be only one to one, i.e., literal. Polymorphic type systems provide mechanisms to specify some regular expression that will match with a family of types. So, different kinds of polymorphism are simply different kinds of regular expressions that you may use to denote the family of types.

Now let's look at different kinds of polymorphism from this perspective. For example, parametric polymorphism is like a dot in regular expressions. E.g., 'a list is . list - that means we match literally with list and a parameter of the list type could be any type. The row polymorphism is a star operator, e.g., <quacks : unit; ..> is the same as <quacks : unit; .*>. And it means that it matches with any type that quacks and does whatever else3. Speaking of nominal subtyping, in this case, we have nominal classes (aka character classes in regexp), and we specify a family of types with the name of their base class. E.g., duck is like [:duck:] and any value that is properly registered as a member of class matches with this type (via class inheritance and the new operator)4. Finally, ad-hoc polymorphism is in fact also nominal and maps to character classes in regular expressions. The main difference here is that the notion of type in ad-hoc polymorphism is applied not to a value, but rather to the name. So a name, like a function name or the + operator, may have multiple definitions (implementations) that should be statically registered using some language mechanism (e.g., overloading an operator, implementing a method, etc). So, ad-hoc polymorphism is just a special case of nominal subtyping.

Now, when we are clear, we can discuss what row polymorphism gives us. Row polymorphism is a feature of structural type systems (also known as duck typing in dynamically typed languages) as contrasted to nominal type systems, which provide subtyping polymorphism. In general, as we discussed above, it allows us to specify, a type as "anything that quacks" as opposed to "anything that implements the IDuck interface". So yes, you can, of course, do the same with the nominal typing by defining the duck interface and explicitly registering all implementations as instances of this interface using some inherit or implements mechanisms. But the main problem here is that your hierarchy is sealed, i.e., you need to change your code to register an implementation in a newly created interface. That breaks the open/closed principle and hampers code reuse. Another problem with the nominal subtyping is that unless your hierarchy forms a lattice (i.e., for any two classes there is always a least upper bound) you can't implement type inference on it5.

Further Reading

  • Objective ML: An effective object-oriented extension to ML - a comprehensive description of the topic.

  • François Pottier and Didier Rémy. The Essence of ML Type Inference. In Benjamin C. Pierce, editor, Advanced Topics in Types and Programming Languages, MIT Press, 2005. - See section 10.8 for a very thorough and detailed explanation of rows.

  • Simple Type Inference for Structural Polymorphism - for a detailed explanation of the interaction between structural and row polymorphism in the presence of type inference.

---- 0) As was pointed in comments by @nekketsuuu, I was using the terminology a little bit voluntaristic, as my intention was to give an easy-to-understand and high-level idea, without going deep into details. I've revised the post since then, to make it a little bit more strict.

1) Yet OCaml provides classes with inheritance and a notion of subtype, it still not a subtyping polymorphism according to the common definition, as it's not nominal. It should come more clear from the rest of the answer.

2) I'm just fixing the terminology, I'm not claiming that my definition is right. Many people think that type denotes a representation of a value, and historically this is correct.

3) Perhaps a better regexp would be <.*; quacks : unit; .*> but I think you got the idea.

4) Thus OCaml doesn't have subtyping polymorphism, although it has a notion of subtype. When you specify a type it will not match with the subtype, it will only match literally, and you need to use an explicit upcasting operator to make a value of type T to be applicable in a context where super(T) is expected. So although there is subtyping in OCaml it is not about polymorphism.

5) And although the lattice requirement doesn't look impossible, it is hard in real life to impose this restriction on hierarchies, or if it is imposed the precision of the type inference will be always bound with the precision of the type hierarchy. So in practice, it doesn't work, cf. Scala

(skip this note on a first read) Though in OCaml there exist row variables that are used to embed row polymorphism into OCaml type inference that has only parametric polymorphism.

‡) Often the word typing is used interchangeably with the type system to refer to a particular set of rules in the overall type system. For example, sometimes we say "OCaml has row typing" to denote the fact, that the OCaml type system provides rules for "row polymorphism".

like image 103
ivg Avatar answered Oct 09 '22 13:10

ivg


Row types are weird. And very powerful.

Row types are used to implement objects and polymorphic variants in OCaml.

But first, here's what we cannot do without row types:

type t1 = { a : int; b : string; }
type t2 = { a : int; c : bool; }

let print_a x = print_int x.a

let ab = { a = 42; b = "foo"; }
let ac = { a = 123; c = false; }

let () =
 print_a ab;
 print_a ac

This code will of course refuse to compile, because print_a must have a unique type: either t1, or t2, but not both. However, in some cases, we may want that exact behavior. That's what row types are for. That's what they do: a more "flexible" type.

In OCaml, there are two main uses of row types: objects and polymorphic variants. In terms of algebra, objects give you "row product" and polymorphic variants "row sum".

What's to note about row types is that you can end up with some subtyping to declare, and very counter intuitive typing and semantics (notably in the case classes).

You can check this paper for more details.

like image 45
PatJ Avatar answered Oct 09 '22 12:10

PatJ


I'll complete PatJ's excellent answer with his example, written using classes.

Given the classes below:

class t1 = object
  method a = 42
  method b = "Hello world"
end

class t2 = object
  method a = 1337
  method b = false
end

And the objects below:

let o1 = new t1
let o2 = new t2

You can write the following:

let print_a t = print_int t#a;;
val print_a : < a : int; .. > -> unit = <fun>

print_a o1;;

42
- : unit = ()

print_a o2;;

1337
- : unit = ()

You can see the row type in print_a's signature. The < a : int; .. > is a type that literally means "any object that has at least a method a with signature int".

like image 9
Richard-Degenne Avatar answered Oct 09 '22 12:10

Richard-Degenne