Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preventing type-variable proliferation in Haskell

Parameterizing out type variables is nice, but it doesn't scale. As an example of what can happen, http://oleg.fi/gists/posts/2017-04-26-indexed-poptics.html gives an abstraction containing 9 type variables. I have been working on a framework for program transformations which are parameterized by the programming language, and could quite conceivable have dozens or hundreds of parameters in the future.

So here's the basic question: I have a datatype T which is parameterized over N types. How can I write a function over T, without writing down N type variables every time I use it?

Here are some approaches I've looked at, none of which are satisfiable:

Parameterize over type variables of kind * -> *

data V = Var1 | Var2 | Var3 | Var4

myfunc :: forall (v :: V -> *). Constraints v => v Var1 -> v Var2
myfunc = ...

So, now, instead of parameterizing over 4 type variables Var1, Var2, Var3, Var4, I need only parameterize over one type variable of kind V -> *.

This works, except that, in this example, myfunc can't be called because v can't be inferred. You would need to change it to Proxy v -> v Var1 -> v Var2. And then, every time you want to use myfunc with difference variables, you would need to define a separate GADT, with its own boilerplate. Something like this:

data MyV a where
  MyVar1 :: Int -> MyV Var1
  MyVar2 :: String -> MyV Var2
  MyVar3 :: Bool -> MyV Var3
  MyVar4 :: [Int] -> MyV Var4

Needless to say, this is very unsatisfying.

This approach is exactly the one taken by the multi-sorted parts of the compdata library. It's quite nice there, because this boilerplate lines up exactly with the normal datatype you'd be writing anyway.

Parameterize over type variable of kind (*,*,*,...) or a record kind

{-# LANGUAGE TypeInType #-}
import Data.Kind

type Vars = (*,*,*,*)

myfunc :: forall (v :: Vars). ...
myfunc = ...

This doesn't work, because, as far as I can tell, there's no way to destruct a type variable v of kind (*,*,*,*) . I can create an instance (Int, String, Bool, [Int]), but I can't actually extract the components Int, String, etc. The best I can do is write a type family:

type family Fst (v :: Vars) where
  Fst '(a,b,c,d) = a
type family Snd ....

myfunc :: forall (v :: Vars). Fst Vars -> Snd Vars

However, this has the same problem as the previous solution: it can't infer v unless you pass in a Proxy v separately. I also tried adding a constraint type Extensional v = (v ~ '(Fst v, Snd v, Third v, Fourth v)), but it didn't help.

Using existentially-quantified variables

data HasFourTypeVariables a b c d = ....
data IOnlyCareAboutTwo a b = forall c d. IOnlyCareAboutTwo (HasFourTypeVariables a b c d)

This doesn't work. Here's what it looks like when you tried to use it:

update :: IOnlyCareAboutTwo a b -> IOnlyCareAboutTwo a b
update = ...

useUpdate :: HasFourTypeVariables a b c d -> HasFourTypeVariables a b c d
useUpdate x = case update (IOnlyCareAboutTwo x) of
                IOnlyCareAboutTwo y -> y

This doesn't typecheck, because the typechecker doesn't know that the input and output of update have the same existential witnesses.

Use Backpack

Backpack looks like the best contender so far. Depending on a module signature with 5 types and associated operations/constraints is kinda like having 5 universally-quantified constrained type variables in every operation, except you don't need to write them down.

Backpack is still fairly new, and not yet integrated with Stack; I don't yet have any experience with it. Also, it seems to be built for parameterizing over entire packages rather than smaller units of functionality; I'm under the impression that it has poor support for the kind of explicit instantiation that would be required here.

Have a long list of type variables, and put up with it

Solution I'm considering doing. :(


Some Background

This problem is threatening to arise in my work on multi-language program transformation, extending https://arxiv.org/pdf/1707.04600.pdf .

In my system, a term in a programming language has a type Term f l, where f is a signature for a programming language (of kind (*->*)->(*->*) -- see the CDTs paper for explanation), and l is a sort. So, a statement in C may have type Term CSig StmtL, while a Java expression may have type Term Java ExpL.

So far, this is just two type variables. But, as I push the system into more generality and the ability to abstract over deeper and deeper semantic properties, the number of type variables may explode. Here are some examples of how:

  1. I want to store annotations on AST nodes. Some, like a node label and its source origin, are decent ideas to have on every tree, but others, like symbol resolution, or mark of whether that subtree has been modified, are desirable in some trees but not others. I'd hence like my representations to be able to flexibly add annotations in some representations but not others, and to be able to write operators that only care whether a subset of these annotations are present. How to do this? One or many type variables for annotations, along with a lot of "HasSymbolResolutionAnnotation a" constraints.

  2. After a lot of experience, and many hours of reflection, I've decided that mutable ASTs are actually a pretty good idea. I'd then like to be able to write operators that can work on both pure and mutable ASTs. I haven't yet figured out how best to do this, but you can bet it will add at least one type variable to my type.

  3. Within that CSig or JavaSig signature, there may be many language-generic nodes, like "Add." For many simple analyses and transformations, it's enough just to say "this language has addition" and stick in a generic Add node. But for more complicated ones, it may matter whether addition in your language can overflow and how, whether the (+) operator is a primitive or can be overridden like in C++ or Haskell, and what constraints it places on the surrounding types. Now, instead of an "Add" node, your language may have an Add MonomorphicAddition NonOverridable OverflowWrapsToNegative node, and your analysis-of-signs defines transfer functions for languages with an Add a b OverflowWrapsToNegative node. There is no limit on the number of variations of an operator you can encode like this. And as long as there are many useful things you can say about such a parameterized operator that only reference a couple of its parameters, it will be desirable to treat them this way.

I hope this helps explain why this is a problem.

like image 258
James Koppel Avatar asked Oct 13 '17 22:10

James Koppel


2 Answers

This answer was inspired by the Trees That Grow paper. It assumes that there exist only a limited number of "variants" of the data type, that only a small subset of all possible instantiations of N parameters are used.

{-# language DataKinds #-}
{-# language TypeFamilies #-}
{-# language KindSignatures #-}
{-# language PolyKinds #-}
{-# language FlexibleContexts #-}

We define our data type like this

data MyType (v::Variant) = MyType (Xa v) (Xb v) (Xc v) Int

data Variant = V1 | V2

type family Xa (v::Variant) :: * where
    Xa V1 = Int
    Xa V2 = Bool

type family Xb (v::Variant) :: * where
    Xb V1 = Bool
    Xb V2 = Char

type family Xc (v::Variant) :: * where
    Xc V1 = String
    Xc V2 = Maybe Int

There are two "variants" of the data type. Each changing field has its own (admittedly boilerplatery) type family which maps a variant to the actual type of the field in that variant.

Here's a simple function which works for all variants of the data type:

getInt :: MyType (v :: Variant) -> Int
getInt (MyType _ _ _ i) = i

With -XConstraintKinds we can define a constraint shared by all fields:

{-# language ConstraintKinds #-}
import GHC.Exts (Constraint)
type MyForAll (p :: * -> Constraint) (v::Variant) = (p (Xa v),p (Xb v),p (Xc v))

We can use it to define functions like

myShow :: MyForAll Show (v :: Variant) => MyType v -> String
myShow (MyType a b c i) = show a ++ show b ++ show c ++ show i

We can also enable -XTypeApplications to specify the variant:

λ :t MyType
MyType :: forall {v :: Variant}. Xa v -> Xb v -> Xc v -> Int -> MyType v
λ :set -XTypeApplications
λ :t MyType @V1
MyType @V1 :: Int -> Bool -> String -> Int -> MyType 'V1
like image 107
danidiaz Avatar answered Sep 28 '22 00:09

danidiaz


If you want to group a number of type variables together you can do that. On the value level you would just use a record, and you can do that on the type level too. Create a record type, and then the lifted version of that can be used to group types. Record access gets a bit clunky since the value level record selector syntax does not lift.

Here's an example that should clarify what I mean.

{-# LANGUAGE StandaloneDeriving, TypeInType, UndecidableInstances #-}
module RecordTyVars where
import Data.Kind

-- The normal way, with 3 type variables.
data OExpr sym lit op = OVar sym | OLit lit | OPrimOp op [OExpr sym lit op]
     deriving (Show)

oe :: OExpr String Integer Op
oe = OPrimOp Add [OVar "x", OLit 1]

data Op = Add | Sub
     deriving (Show)

--------

-- Record that when lifted will contain the types.
data ExprTypes = Types Type Type Type

-- Record access functions, since the record syntax doesn't lift.
type family SymType (r :: ExprTypes) :: * where
    SymType ('Types sym lit op) = sym
type family LitType (r :: ExprTypes) :: * where
    LitType ('Types sym lit op) = lit
type family OpType (r :: ExprTypes) :: * where
    OpType  ('Types sym lit op) = op

-- Using the record of types
data Expr r = Var (SymType r) | Lit (LitType r) | PrimOp (OpType r) [Expr r]

-- Must use standalone deriving when thing the going gets tough.
deriving instance (Show (SymType r), Show (LitType r), Show (OpType r)) =>
                  Show (Expr r)

e :: Expr ('Types String Integer Op)
e = PrimOp Add [Var "x", Lit 1]
like image 45
augustss Avatar answered Sep 28 '22 00:09

augustss