Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hindley-Milner algorithm in Java

I'm working on a simple dataflow based system (imagine it like a LabView editor/runtime) written in Java. The user can wire blocks together in an editor and I need type inference to ensure the dataflow graph is correct, however, most type inference examples are written in mathematical notations, ML, Scala, Perl, etc., which I don't "speak".

I read about the Hindley-Milner algorithm and found this document with a nice example I could implement. It works on a set of T1 = T2 like constraints. However, my dataflow graphs translate to T1 >= T2 like constraints (or T2 extends T1, or covariance, or T1 <: T2 as I saw it in various articles). No lambdas just type variables (used in generic functions like T merge(T in1, T in2)) and concrete types.

To recap the HM algorithm:

Type = {TypeVariable, ConcreteType}
TypeRelation = {LeftType, RightType}
Substitution = {OldType, NewType}
TypeRelations = set of TypeRelation
Substitutions = set of Substitution

1) Initialize TypeRelations to the constraints, Initialize Substitutions to empty
2) Take a TypeRelation
3) If LeftType and RightType are both TypeVariables or are concrete 
      types with LeftType <: RightType Then do nothing
4) If only LeftType is a TypeVariable Then
    replace all occurrences of RightType in TypeRelations and Substitutions
    put LeftType, RightType into Substitutions
5) If only RightType is a TypeVariable then
    replace all occurrences of LeftType in TypeRelations and Substitutions
    put RightType, LeftType into Substitutions
6) Else fail

How can I change the original HM algorithm to work with these kind of relations instead of simple equality relations? Java-ish example or explanation would be much appreciated.

like image 374
akarnokd Avatar asked Jul 21 '11 22:07

akarnokd


2 Answers

I read at least 20 articles and found one (Francois Pottier: Type inference in presence of subtyping: from theory to practice) which I could use:

Input:

Type = { TypeVariable, ConcreteType }
TypeRelation = { Left: Type, Right: Type }
TypeRelations = Deque<TypeRelation>

Helper functions:

ExtendsOrEquals = #(ConcreteType, ConcreteType) => Boolean
Union = #(ConcreteType, ConcreteType) => ConcreteType | fail
Intersection = #(ConcreteType, ConcreteType) => ConcreteType
SubC = #(Type, Type) => List<TypeRelation>

ExtendsOrEquals can tell about two concrete types if the first extends or equals the second, e.g., (String, Object) == true, (Object, String) == false.

Union computes the common subtype of two concrete types if possible, e.g., (Object, Serializable) == Object&Serializable, (Integer, String) == null.

Intersection computes the nearest supertype of two concrete types, e.g., (List, Set) == Collection, (Integer, String) == Object.

SubC is the structural decomposition function, which in this simple case will just return a singleton list containing a new TypeRelation of its parameters.

Tracking structures:

UpperBounds = Map<TypeVariable, Set<Type>>
LowerBounds = Map<TypeVariable, Set<Type>>
Reflexives = List<TypeRelation>

UpperBounds keeps track of types which may be supertypes of a type variable, LowerBounds keeps track of types which may be subtypes of the type variable. Reflexives keeps track of the relations between pairs type variables to help in the bound-rewriting of the algorithm.

The algorithm is as follows:

While TypeRelations is not empty, take a relation rel

  [Case 1] If rel is (left: TypeVariable, right: TypeVariable) and 
           Reflexives does not have an entry with (left, right) {

    found1 = false;
    found2 = false
    for each ab in Reflexives
      // apply a >= b, b >= c then a >= c rule
      if (ab.right == rel.left)
        found1 = true
        add (ab.left, rel.right) to Reflexives
        union and set upper bounds of ab.left 
          with upper bounds of rel.right

      if (ab.left == rel.right)
        found2 = true
        add (rel.left, ab.right) to Reflexives
        intersect and set lower bounds of ab.right 
          with lower bounds of rel.left

    if !found1
        union and set upper bounds of rel.left
          with upper bounds of rel.right
    if !found2
        intersect and set lower bounds of rel.right
          with lower bounds of rel.left

    add TypeRelation(rel.left, rel.right) to Reflexives

    for each lb in LowerBounds of rel.left
      for each ub in UpperBounds of rel.right
        add all SubC(lb, ub) to TypeRelations
  }
  [Case 2] If rel is (left: TypeVariable, right: ConcreteType) and 
      UpperBound of rel.left does not contain rel.right {
    found = false
    for each ab in Reflexives
      if (ab.right == rel.left)
        found = true
        union and set upper bounds of ab.left with rel.right
    if !found 
        union the upper bounds of rel.left with rel.right
    for each lb in LowerBounds of rel.left
      add all SubC(lb, rel.right) to TypeRelations
  }
  [Case 3] If rel is (left: ConcreteType, right: TypeVariable) and
      LowerBound of rel.right does not contain rel.left {
    found = false;
    for each ab in Reflexives
      if (ab.left == rel.right)
         found = true;
         intersect and set lower bounds of ab.right with rel.left
    if !found
       intersect and set lower bounds of rel.right with rel.left
    for each ub in UpperBounds of rel.right
       add each SubC(rel.left, ub) to TypeRelations
  }
  [Case 4] if rel is (left: ConcreteType, Right: ConcreteType) and 
      !ExtendsOrEquals(rel.left, rel.right)
    fail
  }

A basic example:

Merge = (T, T) => T
Sink = U => Void

Sink(Merge("String", 1))

The relations of this expression:

String >= T
Integer >= T
T >= U

1.) rel is (String, T); Case 3 is activated. Because Reflexives is empty, the LowerBounds of T is set to String. No UpperBounds for T is present, therefore, TypeRelations remains unchanged.

2.) rel is (Integer, T); Case 3 is activated again. Reflexives is still empty, the Lower bound of T is set to the intersection of String and Integer, yielding Object, Still no upper bounds for T and no changes in TypeRelations

3.) rel is T >= U. Case 1 is activated. Because Reflexives is empty, the Upper Bounds of T is combined with the Upper bounds of U, which remains empty. Then the lower bounds U is set to the lower bounds ot T, yielding Object >= U. The TypeRelation(T, U) is addet to Reflexives.

4.) the algorithm terminates. From the bounds Object >= T and Object >= U

In another example, a type conflict is demonstrated:

Merge = (T, T) => T
Sink = Integer => Void

Sink(Merge("String", 1))

The relations:

String >= T
Integer >= T
T >= Integer

Steps 1.) and 2.) are the same as above.

3.) rel is T >= U. Case 2 is activated. The case tries to union the Upper Bound of T (which is Object at this point) with Integer, that fails and the algorithm fails.

Extensions to the Type system

Adding generic types to the type system needs an extension in the main cases and in the SubC function.

Type = { TypeVariable, ConcreteType, ParametricType<Type,...>)

Some ideas:

  • If a ConcreteType and a ParametricType meets, that is an error.
  • If a TypeVariable and a ParametricType meets, e.g., T = C(U1,...,Un) then create new Type variables and relations as T1 >= U1, ... , Tn >= Un and work with them.
  • If two ParametricType meets (D<> and C<>) check if D >= C and the number of type arguments are the same, then extract each pair as relations.
like image 156
akarnokd Avatar answered Nov 12 '22 18:11

akarnokd


The Hindley-Milner algorithm is fundamentally a unification algorithm, i.e. an algorithm for solving graph isomorphisms for graph equations with variables.

Hindley-Milner doesn't directly apply to your problem, but a Google search came across some leads; e.g. "Pragmatic Subtyping in Polymorphic Languages", which says "We present a subtyping extension to the Hindley/Milner type system that is based on name inequivalence ...". (I haven't read it.)


... however, most type inference examples are written in mathematical notations, ML, Scala, Perl, etc., which I don't "speak".

I think you are going to have to get over that hurdle yourself. Type theory and type checking are fundamentally mathematical ... and difficult. You need to put in the hard yards to pick up the language.

like image 28
Stephen C Avatar answered Nov 12 '22 18:11

Stephen C