Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should extensible variant types be used in OCaml?

Tags:

ocaml

I took a course on OCaml before extensible variant types were introduced, and I don't know much about them. I have several questions:

  1. (This question was deleted because it attracted a "not answerable objectively" close vote.)
  2. What are the low-level consequences of using EVTs, such as performance, memory representation, and (un-)marshaling?

Note that my question is about extensible variant type specifically, unlike the question suggested as identical to this one (that question was asked prior to the introduction of EVTs!).

like image 513
Pteromys Avatar asked Feb 17 '19 05:02

Pteromys


1 Answers

Extensible variants are quite different from standard variants in term of runtime behavior.

In particular, extension constructors are runtime values that lives inside the module where they were defined. For instance, in

 type t = ..
 module M = struct
   type t +=A
 end
 open M

the second line define a new extension constructor value A and add it to the existing extension constructors of M at runtime. Contrarily, classical variants do not really exist at runtime.

It is possible to observe this difference by noticing that I can use a mli-only compilation unit for classical variants:

 (* classical.mli *)
 type t = A

 (* main.ml *)
 let x = Classical.A

and then compile main.ml with

ocamlopt classical.mli main.ml

without troubles because there are no value involved in the Classical module.

Contrarily with extensible variants, this is not possible. If I have

 (* ext.mli *)
  type t = ..
  type t+=A

 (* main.ml *)
 let x = Ext.A

then the command

ocamlopt ext.mli main.ml

fails with

Error: Required module `Ext' is unavailable

because the runtime value for the extension constructor Ext.A is missing.

You can also peek at both the name and the id of the extension constructor using the Obj module to see those values

 let a = [%extension_constructor A]
 Obj.extension_name a;;
  • : string = "M.A"
 Obj.extension_id a;;
  • : int = 144

(This id is quite brittle and its value it not particurlarly meaningful.) An important point is that extension constructor are distinguished using their memory location. Consequently, constructors with n arguments are implemented as block with n+1 arguments where the first hidden argument is the extension constructor:

type t += B of int
let x = B 0;;

Here, x contains two fields, and not one:

 Obj.size (Obj.repr x);;
  • : int = 2

And the first field is the extension constructor B:

 Obj.field (Obj.repr x) 0 == Obj.repr [%extension_constructor B];;
  • : bool = true

The previous statement also works for n=0: extensible variants are never represented as a tagged integer, contrarily to classical variants.

Since marshalling does not preserve physical equality, it means that extensible sum type cannot be marshalled without losing their identity. For instance, doing a round trip with

 let round_trip (x:'a):'a = Marshall.from_string (Marshall.to_string x []) 0

then testing the result with

  type t += C
  let is_c = function
  | C -> true
  | _ -> false

leads to a failure:

   is_c (round_trip C)
  • : bool = false

because the round-trip allocated a new block when reading the marshalled value This is the same problem which already existed with exceptions, since exceptions are extensible variants.

This also means that pattern-matching on extensible type is quite different at runtime. For instance, if I define a simple variant

 type s = A of int | B of int

and define a function f as

let f = function
| A n | B n -> n

the compiler is smart enough to optimize this function to simply accessing the the first field of the argument.

You can check with ocamlc -dlambda that the function above is represented in the Lambda intermediary representation as:

(function param/1008 (field 0 param/1008)))

However, with extensible variants, not only we need a default pattern

   type e = ..
   type e += A of n | B of n
   let g = function
   | A n | B n -> n
   | _ -> 0

but we also need to compare the argument with each extension constructor in the match leading to a more complex lambda IR for the match

 (function param/1009
   (catch
     (if (== (field 0 param/1009) A/1003) (exit 1 (field 1 param/1009))
       (if (== (field 0 param/1009) B/1004) (exit 1 (field 1 param/1009))
         0))
    with (1 n/1007) n/1007)))

Finally, to conclude with an actual example of extensible variants, in OCaml 4.08, the Format module replaced its string-based user-defined tags with extensible variants.

This means that defining new tags looks like this:

First, we start with the actual definition of the new tags

 type t =  Format.stag = ..
 type Format.stag += Warning | Error

Then the translation functions for those new tags are

let mark_open_stag tag =
match tag with
| Error -> "\x1b[31m" (* aka print the content of the tag in red *)
| Warning -> "\x1b[35m" (* ... in purple *)
| _ -> ""

let mark_close_stag _tag =
  "\x1b[0m" (*reset *)

Installing the new tag is then done with

 let enable ppf =
    Format.pp_set_tags ppf true;
    Format.pp_set_mark_tags ppf true;
    Format.pp_set_formatter_stag_functions ppf
    { (Format.pp_get_formatter_stag_functions ppf ()) with
    mark_open_stag; mark_close_stag }

With some helper function, printing with those new tags can be done with

 Format.printf "This message is %a.@." error "important"
 Format.printf "This one %a.@." warning "not so much"

Compared with string tags, there are few advantages:

  • less room for a spelling mistake
  • no need to serialize/deserialize potentially complex data
  • no mix up between different extension constructor with the same name.
  • chaining multiple user-defined mark_open_stag function is thus safe: each function can only recognise their own extension constructors.
like image 85
octachron Avatar answered Oct 11 '22 19:10

octachron