Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Row polymorphic equality of type-level lists

TL;DR: I want to implement unification for row polymorphism, using a type family RIso (a :: [*]) (b :: [*]) :: Bool, but have gotten stuck.


Background

I’m writing a compiler for a language that has an effect system which uses row polymorphism, as in Extensible Records with Scoped Labels, to represent sets of effects attached to function types. That is, in this language, the kind of the function arrow is:

(->) :: * -> * -> Effect -> *

Where Effect is a sequence of effect labels such as IO or Unsafe—either a “cons” of a label with another effect row, an effect type variable, or the “pure” effect (an empty row). A simple example is the type of map:

<A, B, +E> (List<A>, (A -> B +E) -> List<B> +E)

In hypothetical Haskell syntax:

forall (a :: *) (b :: *) (e :: Effect).
  (a -> Eff e b) -> [a] -> Eff e [b]

This means map takes a List of values of type A, and a function from A to B with a set of effects +E, and returns a List of Bs, also requiring the effects +E. A pure function is fully polymorphic in its effect, which is defaulted to an empty row; so map is pure if its functional argument is pure; otherwise it requires the same effects as its argument.

I’m experimenting with representing the type system of this language within Haskell’s type system, so that I can lean on Haskell’s typechecker to write a verified typechecker in the manner of this blog post.

However, when I get to unifying function types, I need to unify the inputs and outputs by regular unification (equality), but I need to unify the effects by row unification, which is defined like so:

s ≃ { l | s′ } : θ₁
tail(r) ∉ dom(θ₁)    -- †
θ₁(r) ~ θ₁(s′) : θ₂
---------------------------------------- [uni-row]
{ l | r } ~ s : θ₂ ∘ θ₁

This says “if s can be rewritten into { l | s′ } under substitution θ₁, and r and s′ unify under substitution θ₂, then { l | r } unifies with s under substitution θ₂ ∘ θ₁, plus a side condition marked with a dagger (†) that I’ll explain momentarily.

The (≃) relation means that two rows are isomorphic according to the following three rules:

-------------------------- [row-head]
{ l | r } ≃ { l | r } : []

l ≠ l′
r ≃ { l | r′ } : θ
-------------------------------- [row-swap]
{ l′ | r } ≃ { l | l′ | r′ } : θ

fresh(β)
----------------------------- [row-var]
α ≃ { l | β } : [α ↦ {l | β}]

In my current unverified typechecker, these two functions look like this in Haskell:

unify typeEnv0 (Row l r) s = do

  -- Attempt to rewrite.
  maybeSubstitution <- isomorphicRows typeEnv0 l s (rowLast r)
  case maybeSubstitution of

    -- If rows are isomorphic:
    Just (s', substitution, typeEnv1) -> case substitution of

      -- Apply substitution, if any, and unify the tails.
      Just (x, t) -> let
        typeEnv2 = typeEnv1
          { typeEnvVars = Map.insert x t $ typeEnvVars typeEnv1 }
        in unify typeEnv2 r s'
      Nothing -> unify typeEnv1 r s'

    -- Non-isomorphic rows do not unify.
    Nothing -> reportTypeMismatch (Row l r) s

-- Gets the last element in a row.
rowLast :: Type -> Type
rowLast (Row _ r) = rowLast r
rowLast t = t

isomorphicRows implements the rule described in the paper:

When a row { l | r } is unified with some row s, we first try to rewrite s in the form { l | s′ } […] If this succeeds, the unification proceeds by unifying…the tail of the rows.

So given l, r, and s, along with the unification rule, isomorphicRows is used to assert that s can be rewritten into { l | r } under some substitution, failing (Nothing) or returning (Just) the tail of the rewritten row, the substitution (Maybe because it’s always empty or a singleton), and the updated type environment. We don’t actually pass r because we only need its last element rt.

isomorphicRows
  :: TypeEnv    -- Type environment
  -> Type       -- Effect label l
  -> Type       -- Effect row s
  -> Type       -- Effect row rt
  -> Typecheck
    (Maybe
      ( Type
      , Maybe (TypeId, Type)
      , TypeEnv
      ))

The [row-head] rule: a row which already begins with the given label is trivially rewritten by the identity substitution.

isomorphicRows typeEnv0 l (Row l' r') _rt
  | l == l'
  = pure $ Just (r', Nothing, typeEnv0)

The [row-swap] rule: a row that contains a label somewhere within it can be rewritten to place that label at the head.

isomorphicRows typeEnv0 l (Row l' r') rt
  | l /= l' = do
  maybeSubstitution <- isomorphicRows typeEnv0 l r' rt
  pure $ case maybeSubstitution of
    Just (r'', substitution, typeEnv1) -> Just
      (Row l r'', substitution, typeEnv1)
    Nothing -> Nothing

The [row-var] rule: when unifying with a type variable, no label is present, so we can’t directly test for equality, and must return a fresh variable for the row tail.

† This enforces the side condition that prevents unifying rows with a common tail but distinct prefixes, thereby ensuring termination even though we’re adding fresh type variables.

isomorphicRows typeEnv0 l r@(TypeVar a) rt
  | r /= rt = do
  (b, typeEnv1) <- freshTypeVar typeEnv0
  pure $ Just (b, Just (a, Row l b), typeEnv1)

In any other case, the rows are not isomorphic.

isomorphicRows _ _ _ _ = Nothing

What I’ve Tried

I’m attempting to translate row unification with isomorphicRows to the type level, using a type family operating on type-level lists of types:

type family RIso (ra :: [*]) (rb :: [*]) :: Bool where

  -- [row-head]
  RIso ((l) ': r) ((l) ': s) = RIso r s

  -- [row-swap]
  RIso ((l) ': (l') ': r) ((l') ': (l) ': r)
    = TNot (TEq l l')

  RIso r r = 'True

  RIso _ _ = 'False

type family TNot (a :: Bool) :: Bool where
  TNot 'False = 'True
  TNot 'True = 'False

type family TEq (a :: Bool) (b :: Bool) :: Bool where
  TEq a a = 'True
  TEq _ _ = 'False

That seems to work for basic cases:

> :kind! RIso '[] '[]
'True

> :kind! RIso '[Int] '[Int]
'True

> :kind! RIso '[Int, Char] '[Char, Int]
'True

> :kind! RIso '[Int, Char] '[Int, String]
'False

> :kind! RIso '[Int, Char, String] '[Char, Int, String]
'True

But fails on more complex cases:

> :kind! RIso '[Int, String, Char] '[Char, Int, String]
'False

I believe it’s because I may have implemented the [row-swap] rule incorrectly (the TNot seems suspicious), and I’m missing the [row-var] rule. I’m stuck on the implementation of [row-var] because I don’t know how to bring a fresh type variable into scope (or whether that’s even the right approach). I tried this:

type family REq (ra :: [*]) (rb :: [*]) :: Bool where
  …
  -- [row-var]
  REq (l ': r) rt = TAnd (TNot (TEq r rt)) (TEq r (Row l b))
  …

type family TAnd (a :: Bool) (b :: Bool) :: Bool where
  TAnd 'True 'True = 'True
  TAnd _ _ = 'False

But I wasn’t surprised to see that the free b generates a type error:

error: Not in scope: type variable ‘b’

Question

  • Is it possible to introduce a fresh variable with forall or something, or should I give up on trying to implement the typing rules directly and take a different approach?

  • Alternatively, is there some existing extensible-record library that specifically uses row unification (not some other method) so I can use the library or learn the technique from it?

Note that this is not the same as simple set equality, because there may be duplicate labels, and they are “scoped” in the sense described in the paper I linked at the top of this question. This is important for two reasons: first, I intend to extend this in the future so that an effect label may have type parameters, e.g., ST s; and second, I intend to use this machinery to implement extensible records and variants in the language.

like image 470
Jon Purdy Avatar asked Jun 27 '18 00:06

Jon Purdy


1 Answers

Is it possible to introduce a fresh variable with forall or something

Unfortunately not. That way lies impredicativity and there is no good story for it in Haskell.

Computation via type families will get stuck as soon as it inspects a variable. You will have to encode (row) variables of your guest language as concrete types to do anything with them.

like image 90
Li-yao Xia Avatar answered Nov 09 '22 05:11

Li-yao Xia