This is probably impossible, but I feel like I might learn something, or there might be a wildly different way to approach it.
I'm writing some code that has some physics simulation elements, and I might be dealing in a bunch of different units. I feel like it's worth making the type system do some work for me on this, so that I can't, for instance, add mass to distance or something like that.
That would be easy enough:
module Mass : sig
type t
val sum : t -> t -> t
...
end = struct
type t = int
let sum = +
...
end
module Distance : sig
type t
val sum : t -> t -> t
...
end = struct
type t = int
let sum = +
...
end
And now the compiler will stop me from trying to mix the two (right?). Something similar should also work to express units of the same type (like, pounds vs. kilograms), and maybe even keep me safe from some precision or overflow errors. So far so easy. The interesting/difficult part is that I'd like to make a fluid framework for handling combinations of units, like meters per second squared, etc.
I can come kind of close by playing with functors:
module MakeProduct(F : UnitT)(S: UnitT) = struct
type t = (F.t, S.t)
...
end
module MakeRatio(Numerator : UnitT)(Denominator: UnitT) = struct
type t = (Numerator.t, Denominator.t)
...
end
and then I could just have
module MetersPerSecondSquared = MakeRatio(MakeRatio(Meter)( Second))(Second)
There would be some very clunky function names, but that should give me a type-safe system in which I can multiply 25 m/s^2
by 5s
and get 125m/s.
The problem I see, other than clunkiness, is that the system won't be able to recognize types that express the same things in different order. For instance, I could also express the above as:
MakeRatio(Meter)(Product(Second)(Second))
Both should ultimately express the same concept, but I don't know of a way to tell the type system that these are the same, or that you should still be able to multiply the second by 5s
and get a result in m/s
.
What I'm trying to learn is:
It is possible to do some limited type-level arithmetic with the right encoding. However, any encoding will be limited by the fact that OCaml type system does not know about arithmetic, and cannot be tricked into proving complicated arithmetic theorem by itself.
One encoding that may strike a good compromise between complexity and features is to use a fixed set of core units (for instance m
, s
and kg
) and a phantom type that describes the units of a float.
module Units: sig
type 'a t
val m: <m: ?one; s: ?zero; kg: ?zero> t
end = struct
type 'a t = float
let m = 1.
end
Here the type <m:'m; s:'s; kg:'kg> Units.t
is essentially a float augmented with some a type parameter <m:'m; s:'s; kg:'kg>
that describe the units exponents of each base units.
We want this exponent to be a type-level integer (so ?zero should be a type-level encoding for 0 etc...).
One useful encoding for integers is to encode them as translation rather on top of unary integer.
First, we can define an unary z
(for zero
) type and a successor function at the type level:
type z = Zero
type 'a succ = S
Then we can encode zero
as the function that maps integer n
to n
:
type 'n zero = 'n * 'n
and one
as the function that maps integer n
to n + 1
:
type 'n one = 'n * 'n succ
With this encoding, we can fill out the ?zero
and ?one
placeholder in the Unit
module:
module Unit: sig
type +'a t
(* Generators *)
val m: <m:_ one; s:_ zero; kg:_ zero> t
val s: <m:_ zero; s:_ one; kg:_ zero> t
val kg: <m:_ zero; s:_ zero; kg:_ one> t
...
end
Then we can use our translation encoding to trick the typechecker into computing addition through type unification:
val ( * ):
<m:'m_low * 'm_mid; s:'s_low * 's_mid; kg:'kg_low * 'kg_mid> t ->
<m:'m_mid * 'm_high; s:'s_mid * 's_high; kg:'kg_mid * 'kg_high> t ->
<m:'m_low * 'm_high; s:'s_low * 's_high; kg:'kg_low * 'kg_high> t
Here, if we look at what is happening on each component, we are essentially stating that if we have a translation from 'm_low
to 'm_mid
and another translation from 'm_mid
to m_high
, the sum of those two translations is the translation from 'm_low
to 'm_high
. Thus, we have implemented addition at the type-level.
Putting everything together, we end up with
module Unit: sig
type +'a t
(* Generators *)
(* Floats are dimensionless *)
val scalar: float -> <m:_ zero; s: _ zero; kg: _ zero> t
val float: <m:_ zero; s: _ zero; kg: _ zero> t -> float
(* Base units *)
val m: <m:_ one; s:_ zero; kg:_ zero> t
val s: <m:_ zero; s:_ one; kg:_ zero> t
val kg: <m:_ zero; s:_ zero; kg:_ one> t
(* Arithmetic operations *)
val ( + ): 'a t -> 'a t -> 'a t
val ( * ):
<m:'m_low * 'm_mid; s:'s_low * 's_mid; kg:'kg_low * 'kg_mid> t ->
<m:'m_mid * 'm_high; s:'s_mid * 's_high; kg:'kg_mid * 'kg_high> t ->
<m:'m_low * 'm_high; s:'s_low * 's_high; kg:'kg_low * 'kg_high> t
val ( / ) :
<m:'m_low * 'm_high; s:'s_low * 's_high; kg:'kg_low * 'kg_high> t ->
<m:'m_mid * 'm_high; s:'s_mid * 's_high; kg:'kg_mid * 'kg_high> t ->
<m:'m_low * 'm_mid ; s:'s_low * 's_mid ; kg:'kg_low * 'kg_mid > t
end = struct
type +'a t = float
let scalar x = x let float x = x
let ( + ) = ( +. ) let ( * ) = ( *. ) let ( / ) = ( /. )
let m = 1. let s = 1. let kg = 1.
end
Then we get the expected behavior: only values with the same dimension can added (or subtracted), multiplying values is done by adding the dimensional components (and the reverse for division). For instance, this code compiles correctly
open Units
let ( *. ) x y = scalar x * y
let au = 149_597_870_700. *. m
let c = 299_792_458. *. m / s
let year = 86400. *. (365. *. s)
let ok = float @@ (c * year) / au
whereas trying to add an astronomical unit to a year yields a type error
let error = year + au
Error: This expression has type < kg : 'a * 'a; m : 'b * 'b succ; s : 'c * 'c > Unit.t but an expression was expected of type < kg : 'a * 'a; m : 'b * 'b; s : 'd * 'd succ > Unit.t The type variable 'b occurs inside 'b succ
However, the type error is not really understandable ... which is a common problem with using an encoding.
There is another important limitation with such encoding is that we are using the unification of type variable to do computation. By doing so, whenever the type variable has not been generalized, we are consuming it when doing the computation. This leads to strange errors. For instance, this function works fine
let strange_but_ok x y = m * x + ((y/m) * m) * m
whereas this one does not typecheck
let strange_and_fail x = m * x + ((x/m) * m) * m
Fortunately, since our phantom type parameter is covariant, the relaxed value restrictions will ensure that most of the time the type variables are generalized on time; and the issue will only manifest when mixing together function arguments of different dimensions.
Another important limitation of this encoding is that we are limited to additions, subtractions, multiplications and divisions of units. It is for instance impossible to compute a square root with this representation.
A way to lift this limitation is to still use a phantom type parameters for units, represent addition, multiplication and more using type constructors, and add some axiomatic equality between those type constructors. But then the user has to prove by hand the equivalence between different representations of the same integer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With