Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ensuring that two (G)ADTs have the same underlying representation in (GHC) Haskell

Tags:

haskell

ghc

In Haskell, sometimes for performance people will use unsafeCoerce (or the safer coerce) to translate between types that have the same internal representation. The most common example of this that I know of is for lists of newtypes:

newtype Identity a = Identity a

f :: [Identity a] -> [a]
f = coerce

Now, I have two GADTs in a codebase I'm working on that look something like this pared down:

data Typ where
    PredT :: Typ
    ProcT :: [Typ] -> Typ
    IntT :: Typ
    ListT :: Typ -> Typ


data HKTyp v (f :: * -> * -> *) where
    HKPredT :: HKTyp v f
    HKProcT :: [HKTyp v f] -> HKTyp v f
    HKIntT :: HKTyp v f
    HKListT :: f v (HKTyp v f) -> HKTyp v f  

I need these types to be different (rather than using the later as a generalization of the former), because the singletons library (or at least the template haskell functions) doesn't like higher-kinded data. Now, because I have to keep these types separate, I want to have some conversion functions between them:

newtype Id v a = Id a

promoteHK :: Typ -> HKTyp v Id
promoteHK PredT = HKPredT
promoteHK (ProcT ts) = HKProcT (fmap promoteHK ts)
promoteHK IntT = HKIntT
promoteHK (ListT x) = HKListT (Id $ promoteHK x)

demoteHK :: HKTyp v Id -> Typ
demoteHK HKPredT = PredT
demoteHK (HKProcT (Id ts)) = ProcT (fmap demoteHK ts)
demoteHK HKIntT = IntT
demoteHK (HKListT (Id x)) = HKListT x

These are pretty mechanical to write, but that's not the issue.

While I'm sure in many cases, GHC could inline and beta-reduce applications of demoteHK and promoteHK at compile time, thus not causing any runtime costs for doing these conversions, I really want to be able to write

f :: [Typ] -> [HKTyp v Id]
f = coerce

to avoid traversing the data structure, since these types are so similar, and hence (I assume) should have the same underlying representation in memory.

My question is two-fold.

  1. Do these types in fact have the same memory representation in GHC?
  2. Are there strong guarantees in GHC for how (G)ADTs are laid out in memory that lets you do things like this in general?
like image 261
Nathan BeDell Avatar asked Feb 02 '21 14:02

Nathan BeDell


1 Answers

I haven't tested the performance of the following, and it may be convoluted enough that GHC, in fact, can't optimize it, but it will let us build a better tool. The idea is to use Generics.

The plan is to define a type class that coerces two types that have the same Generic structure along with a function that makes use of the class. Consider the following:

{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics

class GenericCoerce a b where
  genericCoerce' :: a x -> b x

genericCoerce :: (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => x -> y
genericCoerce = to . genericCoerce' . from

Of course, we still need to define what makes two Reps coercible, but just like your promoteHK and demoteHK definitions, this is somewhat mechanical:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE EmptyCase #-}

instance GenericCoerce V1 V1 where
  genericCoerce' = \case

instance GenericCoerce U1 U1 where
  genericCoerce' = id

instance (GenericCoerce f f', GenericCoerce g g') => GenericCoerce (f :+: g) (f' :+: g') where
  genericCoerce' (L1 x) = L1 (genericCoerce' x)
  genericCoerce' (R1 x) = R1 (genericCoerce' x)

instance (GenericCoerce f f', GenericCoerce g g') => GenericCoerce (f :*: g) (f' :*: g') where
  genericCoerce' (x :*: y) = genericCoerce' x :*: genericCoerce' y

instance GenericCoerce cs1 cs2 => GenericCoerce (M1 t m cs1) (M1 t m' cs2) where
  genericCoerce' (M1 x) = M1 (genericCoerce' x)

instance (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => GenericCoerce (K1 t x) (K1 t y) where
  genericCoerce' (K1 x) = K1 (genericCoerce x)

This actually works for very basic cases! Consider a data type like:

data Foo = Bar | Baz
  deriving (Generic, Show)

We get the behavior we want:

> genericCoerce @Bool @Foo True
Baz

> genericCoerce @Foo @Bool Bar
False

However, the problem with this Generic way of doing coercion is that it doesn't exactly play nicely with the normal way of coercing data types. Specifically, the type rep of a given type and the type rep of that type wrapped in a newtype wrapper are not the same.

One possible solution is the use of (gasp) incoherent instances. This may be a line too far for you, but if you're okay with them, consider the following two additional instances:

-- instances to handle newtype constructor
instance {-# INCOHERENT #-} (Generic x, Rep x ~ D1 m x', GenericCoerce x' y) => GenericCoerce (C1 m2 (S1 m3 (Rec0 x))) y where
  genericCoerce' = genericCoerce' . unM1 . from . unK1 . unM1 . unM1

instance {-# INCOHERENT #-} (Generic y, Rep y ~ D1 m y', GenericCoerce x y') => GenericCoerce x (C1 m2 (S1 m3 (Rec0 y))) where
  genericCoerce' = M1 . M1 . K1 . to . M1 . genericCoerce'

These two instances specifically target the case where one of the types has a newtype wrapper. Incoherent instances are considered dangerous, and if you have a lot of nested types/newtypes in your coercion, perhaps something could go wrong. That said, with these two instance in play, you're good to go with the example you gave:

promoteHK :: Typ -> HKTyp v Id
promoteHK = genericCoerce

demoteHK :: HKTyp v Id -> Typ
demoteHK = genericCoerce

In action:

> promoteHK PredT
HKPredT

> promoteHK (ListT PredT)
HKListT (Id HKPredT)

> promoteHK (ListT (ListT (ListT PredT)))
HKListT (Id (HKListT (Id (HKListT (Id HKPredT)))))

> demoteHK (HKProcT [HKIntT, HKPredT])
ProcT [IntT,PredT]

So far, I haven't quite answered your question. You asked if two seemingly isomorphic types really do have the same memory representation in GHC and if there are any guarantees in GHC that let you do things like this in general (I assume by "things like this", you mean coercions between isomorphic data types).

As far as I'm aware, there are no guarantees from GHC, but genericCoerce has given us a slightly firmer ground to walk on. Excluding the incoherent instances hack, the original version of genericCoerce converts a data type with phantom type parameters into the same datatype just with different phantom parameters. Technically, I can provide no guarantee that GHC will store multiple instances of the same runtime data the same way, but it seems to me like a pretty easy assumption to make.

Once we add in the incoherent instances and the newtype wrapper shenanigans, we're standing on less solid grounds, but the fact that it all works is some consolation.

Indeed, now that we see that the core of genericCoerce really is a coercion (we're building the same data type up from what was just destructed in every case), and if we trust that the newtype-wrapper incoherent instances also function as a coercion, then we can write:

genericCoerceProbablySafe :: (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => x -> y
genericCoerceProbablySafe = unsafeCoerce

We get better performance than genericCoerce and more type safety than unsafeCoerce, and we've reduced your questions down to: "Is the Generic Rep an accurate proxy for how GHC stores memory (up to newtype wrappers)?"

like image 111
DDub Avatar answered Oct 26 '22 00:10

DDub