Parameterizing out type variables is nice, but it doesn't scale. As an example of what can happen, http://oleg.fi/gists/posts/2017-04-26-indexed-poptics.html gives an abstraction containing 9 type variables. I have been working on a framework for program transformations which are parameterized by the programming language, and could quite conceivable have dozens or hundreds of parameters in the future.
So here's the basic question: I have a datatype T which is parameterized over N types. How can I write a function over T, without writing down N type variables every time I use it?
Here are some approaches I've looked at, none of which are satisfiable:
Parameterize over type variables of kind * -> *
data V = Var1 | Var2 | Var3 | Var4
myfunc :: forall (v :: V -> *). Constraints v => v Var1 -> v Var2
myfunc = ...
So, now, instead of parameterizing over 4 type variables Var1, Var2, Var3, Var4
, I need only parameterize over one type variable of kind V -> *
.
This works, except that, in this example, myfunc
can't be called because v
can't be inferred. You would need to change it to Proxy v -> v Var1 -> v Var2
. And then, every time you want to use myfunc
with difference variables, you would need to define a separate GADT, with its own boilerplate. Something like this:
data MyV a where
MyVar1 :: Int -> MyV Var1
MyVar2 :: String -> MyV Var2
MyVar3 :: Bool -> MyV Var3
MyVar4 :: [Int] -> MyV Var4
Needless to say, this is very unsatisfying.
This approach is exactly the one taken by the multi-sorted parts of the compdata library. It's quite nice there, because this boilerplate lines up exactly with the normal datatype you'd be writing anyway.
Parameterize over type variable of kind (*,*,*,...)
or a record kind
{-# LANGUAGE TypeInType #-}
import Data.Kind
type Vars = (*,*,*,*)
myfunc :: forall (v :: Vars). ...
myfunc = ...
This doesn't work, because, as far as I can tell, there's no way to destruct a type variable v
of kind (*,*,*,*)
. I can create an instance (Int, String, Bool, [Int])
, but I can't actually extract the components Int, String, etc. The best I can do is write a type family:
type family Fst (v :: Vars) where
Fst '(a,b,c,d) = a
type family Snd ....
myfunc :: forall (v :: Vars). Fst Vars -> Snd Vars
However, this has the same problem as the previous solution: it can't infer v
unless you pass in a Proxy v
separately. I also tried adding a constraint type Extensional v = (v ~ '(Fst v, Snd v, Third v, Fourth v))
, but it didn't help.
Using existentially-quantified variables
data HasFourTypeVariables a b c d = ....
data IOnlyCareAboutTwo a b = forall c d. IOnlyCareAboutTwo (HasFourTypeVariables a b c d)
This doesn't work. Here's what it looks like when you tried to use it:
update :: IOnlyCareAboutTwo a b -> IOnlyCareAboutTwo a b
update = ...
useUpdate :: HasFourTypeVariables a b c d -> HasFourTypeVariables a b c d
useUpdate x = case update (IOnlyCareAboutTwo x) of
IOnlyCareAboutTwo y -> y
This doesn't typecheck, because the typechecker doesn't know that the input and output of update
have the same existential witnesses.
Use Backpack
Backpack looks like the best contender so far. Depending on a module signature with 5 types and associated operations/constraints is kinda like having 5 universally-quantified constrained type variables in every operation, except you don't need to write them down.
Backpack is still fairly new, and not yet integrated with Stack; I don't yet have any experience with it. Also, it seems to be built for parameterizing over entire packages rather than smaller units of functionality; I'm under the impression that it has poor support for the kind of explicit instantiation that would be required here.
Have a long list of type variables, and put up with it
Solution I'm considering doing. :(
Some Background
This problem is threatening to arise in my work on multi-language program transformation, extending https://arxiv.org/pdf/1707.04600.pdf .
In my system, a term in a programming language has a type Term f l
, where f
is a signature for a programming language (of kind (*->*)->(*->*)
-- see the CDTs paper for explanation), and l
is a sort. So, a statement in C may have type Term CSig StmtL
, while a Java expression may have type Term Java ExpL
.
So far, this is just two type variables. But, as I push the system into more generality and the ability to abstract over deeper and deeper semantic properties, the number of type variables may explode. Here are some examples of how:
I want to store annotations on AST nodes. Some, like a node label and its source origin, are decent ideas to have on every tree, but others, like symbol resolution, or mark of whether that subtree has been modified, are desirable in some trees but not others. I'd hence like my representations to be able to flexibly add annotations in some representations but not others, and to be able to write operators that only care whether a subset of these annotations are present. How to do this? One or many type variables for annotations, along with a lot of "HasSymbolResolutionAnnotation a" constraints.
After a lot of experience, and many hours of reflection, I've decided that mutable ASTs are actually a pretty good idea. I'd then like to be able to write operators that can work on both pure and mutable ASTs. I haven't yet figured out how best to do this, but you can bet it will add at least one type variable to my type.
Within that CSig
or JavaSig
signature, there may be many language-generic nodes, like "Add." For many simple analyses and transformations, it's enough just to say "this language has addition" and stick in a generic Add node. But for more complicated ones, it may matter whether addition in your language can overflow and how, whether the (+) operator is a primitive or can be overridden like in C++ or Haskell, and what constraints it places on the surrounding types. Now, instead of an "Add" node, your language may have an Add MonomorphicAddition NonOverridable OverflowWrapsToNegative
node, and your analysis-of-signs defines transfer functions for languages with an Add a b OverflowWrapsToNegative
node. There is no limit on the number of variations of an operator you can encode like this. And as long as there are many useful things you can say about such a parameterized operator that only reference a couple of its parameters, it will be desirable to treat them this way.
I hope this helps explain why this is a problem.
This answer was inspired by the Trees That Grow paper. It assumes that there exist only a limited number of "variants" of the data type, that only a small subset of all possible instantiations of N parameters are used.
{-# language DataKinds #-}
{-# language TypeFamilies #-}
{-# language KindSignatures #-}
{-# language PolyKinds #-}
{-# language FlexibleContexts #-}
We define our data type like this
data MyType (v::Variant) = MyType (Xa v) (Xb v) (Xc v) Int
data Variant = V1 | V2
type family Xa (v::Variant) :: * where
Xa V1 = Int
Xa V2 = Bool
type family Xb (v::Variant) :: * where
Xb V1 = Bool
Xb V2 = Char
type family Xc (v::Variant) :: * where
Xc V1 = String
Xc V2 = Maybe Int
There are two "variants" of the data type. Each changing field has its own (admittedly boilerplatery) type family which maps a variant to the actual type of the field in that variant.
Here's a simple function which works for all variants of the data type:
getInt :: MyType (v :: Variant) -> Int
getInt (MyType _ _ _ i) = i
With -XConstraintKinds
we can define a constraint shared by all fields:
{-# language ConstraintKinds #-}
import GHC.Exts (Constraint)
type MyForAll (p :: * -> Constraint) (v::Variant) = (p (Xa v),p (Xb v),p (Xc v))
We can use it to define functions like
myShow :: MyForAll Show (v :: Variant) => MyType v -> String
myShow (MyType a b c i) = show a ++ show b ++ show c ++ show i
We can also enable -XTypeApplications
to specify the variant:
λ :t MyType
MyType :: forall {v :: Variant}. Xa v -> Xb v -> Xc v -> Int -> MyType v
λ :set -XTypeApplications
λ :t MyType @V1
MyType @V1 :: Int -> Bool -> String -> Int -> MyType 'V1
If you want to group a number of type variables together you can do that. On the value level you would just use a record, and you can do that on the type level too. Create a record type, and then the lifted version of that can be used to group types. Record access gets a bit clunky since the value level record selector syntax does not lift.
Here's an example that should clarify what I mean.
{-# LANGUAGE StandaloneDeriving, TypeInType, UndecidableInstances #-}
module RecordTyVars where
import Data.Kind
-- The normal way, with 3 type variables.
data OExpr sym lit op = OVar sym | OLit lit | OPrimOp op [OExpr sym lit op]
deriving (Show)
oe :: OExpr String Integer Op
oe = OPrimOp Add [OVar "x", OLit 1]
data Op = Add | Sub
deriving (Show)
--------
-- Record that when lifted will contain the types.
data ExprTypes = Types Type Type Type
-- Record access functions, since the record syntax doesn't lift.
type family SymType (r :: ExprTypes) :: * where
SymType ('Types sym lit op) = sym
type family LitType (r :: ExprTypes) :: * where
LitType ('Types sym lit op) = lit
type family OpType (r :: ExprTypes) :: * where
OpType ('Types sym lit op) = op
-- Using the record of types
data Expr r = Var (SymType r) | Lit (LitType r) | PrimOp (OpType r) [Expr r]
-- Must use standalone deriving when thing the going gets tough.
deriving instance (Show (SymType r), Show (LitType r), Show (OpType r)) =>
Show (Expr r)
e :: Expr ('Types String Integer Op)
e = PrimOp Add [Var "x", Lit 1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With