TL;DR: I want to implement unification for row polymorphism, using a type family RIso (a :: [*]) (b :: [*]) :: Bool
, but have gotten stuck.
I’m writing a compiler for a language that has an effect system which uses row polymorphism, as in Extensible Records with Scoped Labels, to represent sets of effects attached to function types. That is, in this language, the kind of the function arrow is:
(->) :: * -> * -> Effect -> *
Where Effect
is a sequence of effect labels such as IO
or Unsafe
—either a “cons” of a label with another effect row, an effect type variable, or the “pure” effect (an empty row). A simple example is the type of map
:
<A, B, +E> (List<A>, (A -> B +E) -> List<B> +E)
In hypothetical Haskell syntax:
forall (a :: *) (b :: *) (e :: Effect).
(a -> Eff e b) -> [a] -> Eff e [b]
This means map
takes a List
of values of type A
, and a function from A
to B
with a set of effects +E
, and returns a List
of B
s, also requiring the effects +E
. A pure function is fully polymorphic in its effect, which is defaulted to an empty row; so map
is pure if its functional argument is pure; otherwise it requires the same effects as its argument.
I’m experimenting with representing the type system of this language within Haskell’s type system, so that I can lean on Haskell’s typechecker to write a verified typechecker in the manner of this blog post.
However, when I get to unifying function types, I need to unify the inputs and outputs by regular unification (equality), but I need to unify the effects by row unification, which is defined like so:
s ≃ { l | s′ } : θ₁
tail(r) ∉ dom(θ₁) -- †
θ₁(r) ~ θ₁(s′) : θ₂
---------------------------------------- [uni-row]
{ l | r } ~ s : θ₂ ∘ θ₁
This says “if s
can be rewritten into { l | s′ }
under substitution θ₁
, and r
and s′
unify under substitution θ₂
, then { l | r }
unifies with s
under substitution θ₂ ∘ θ₁
, plus a side condition marked with a dagger (†) that I’ll explain momentarily.
The (≃) relation means that two rows are isomorphic according to the following three rules:
-------------------------- [row-head]
{ l | r } ≃ { l | r } : []
l ≠ l′
r ≃ { l | r′ } : θ
-------------------------------- [row-swap]
{ l′ | r } ≃ { l | l′ | r′ } : θ
fresh(β)
----------------------------- [row-var]
α ≃ { l | β } : [α ↦ {l | β}]
In my current unverified typechecker, these two functions look like this in Haskell:
unify typeEnv0 (Row l r) s = do
-- Attempt to rewrite.
maybeSubstitution <- isomorphicRows typeEnv0 l s (rowLast r)
case maybeSubstitution of
-- If rows are isomorphic:
Just (s', substitution, typeEnv1) -> case substitution of
-- Apply substitution, if any, and unify the tails.
Just (x, t) -> let
typeEnv2 = typeEnv1
{ typeEnvVars = Map.insert x t $ typeEnvVars typeEnv1 }
in unify typeEnv2 r s'
Nothing -> unify typeEnv1 r s'
-- Non-isomorphic rows do not unify.
Nothing -> reportTypeMismatch (Row l r) s
-- Gets the last element in a row.
rowLast :: Type -> Type
rowLast (Row _ r) = rowLast r
rowLast t = t
isomorphicRows
implements the rule described in the paper:
When a row
{ l | r }
is unified with some rows
, we first try to rewrites
in the form{ l | s′ }
[…] If this succeeds, the unification proceeds by unifying…the tail of the rows.
So given l
, r
, and s
, along with the unification rule, isomorphicRows
is used to assert that s
can be rewritten into { l | r }
under some substitution, failing (Nothing
) or returning (Just
) the tail of the rewritten row, the substitution (Maybe
because it’s always empty or a singleton), and the updated type environment. We don’t actually pass r
because we only need its last element rt
.
isomorphicRows
:: TypeEnv -- Type environment
-> Type -- Effect label l
-> Type -- Effect row s
-> Type -- Effect row rt
-> Typecheck
(Maybe
( Type
, Maybe (TypeId, Type)
, TypeEnv
))
The [row-head]
rule: a row which already begins with the given label is trivially rewritten by the identity substitution.
isomorphicRows typeEnv0 l (Row l' r') _rt
| l == l'
= pure $ Just (r', Nothing, typeEnv0)
The [row-swap]
rule: a row that contains a label somewhere within it can be rewritten to place that label at the head.
isomorphicRows typeEnv0 l (Row l' r') rt
| l /= l' = do
maybeSubstitution <- isomorphicRows typeEnv0 l r' rt
pure $ case maybeSubstitution of
Just (r'', substitution, typeEnv1) -> Just
(Row l r'', substitution, typeEnv1)
Nothing -> Nothing
The [row-var]
rule: when unifying with a type variable, no label is present, so we can’t directly test for equality, and must return a fresh variable for the row tail.
† This enforces the side condition that prevents unifying rows with a common tail but distinct prefixes, thereby ensuring termination even though we’re adding fresh type variables.
isomorphicRows typeEnv0 l r@(TypeVar a) rt
| r /= rt = do
(b, typeEnv1) <- freshTypeVar typeEnv0
pure $ Just (b, Just (a, Row l b), typeEnv1)
In any other case, the rows are not isomorphic.
isomorphicRows _ _ _ _ = Nothing
I’m attempting to translate row unification with isomorphicRows
to the type level, using a type family operating on type-level lists of types:
type family RIso (ra :: [*]) (rb :: [*]) :: Bool where
-- [row-head]
RIso ((l) ': r) ((l) ': s) = RIso r s
-- [row-swap]
RIso ((l) ': (l') ': r) ((l') ': (l) ': r)
= TNot (TEq l l')
RIso r r = 'True
RIso _ _ = 'False
type family TNot (a :: Bool) :: Bool where
TNot 'False = 'True
TNot 'True = 'False
type family TEq (a :: Bool) (b :: Bool) :: Bool where
TEq a a = 'True
TEq _ _ = 'False
That seems to work for basic cases:
> :kind! RIso '[] '[]
'True
> :kind! RIso '[Int] '[Int]
'True
> :kind! RIso '[Int, Char] '[Char, Int]
'True
> :kind! RIso '[Int, Char] '[Int, String]
'False
> :kind! RIso '[Int, Char, String] '[Char, Int, String]
'True
But fails on more complex cases:
> :kind! RIso '[Int, String, Char] '[Char, Int, String]
'False
I believe it’s because I may have implemented the [row-swap]
rule incorrectly (the TNot
seems suspicious), and I’m missing the [row-var]
rule. I’m stuck on the implementation of [row-var]
because I don’t know how to bring a fresh type variable into scope (or whether that’s even the right approach). I tried this:
type family REq (ra :: [*]) (rb :: [*]) :: Bool where
…
-- [row-var]
REq (l ': r) rt = TAnd (TNot (TEq r rt)) (TEq r (Row l b))
…
type family TAnd (a :: Bool) (b :: Bool) :: Bool where
TAnd 'True 'True = 'True
TAnd _ _ = 'False
But I wasn’t surprised to see that the free b
generates a type error:
error: Not in scope: type variable ‘b’
Is it possible to introduce a fresh variable with forall
or something, or should I give up on trying to implement the typing rules directly and take a different approach?
Alternatively, is there some existing extensible-record library that specifically uses row unification (not some other method) so I can use the library or learn the technique from it?
Note that this is not the same as simple set equality, because there may be duplicate labels, and they are “scoped” in the sense described in the paper I linked at the top of this question. This is important for two reasons: first, I intend to extend this in the future so that an effect label may have type parameters, e.g., ST s
; and second, I intend to use this machinery to implement extensible records and variants in the language.
Is it possible to introduce a fresh variable with forall or something
Unfortunately not. That way lies impredicativity and there is no good story for it in Haskell.
Computation via type families will get stuck as soon as it inspects a variable. You will have to encode (row) variables of your guest language as concrete types to do anything with them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With