In Haskell, sometimes for performance people will use unsafeCoerce
(or the safer coerce
) to translate between types that have the same internal representation. The most common example of this that I know of is for lists of newtypes:
newtype Identity a = Identity a
f :: [Identity a] -> [a]
f = coerce
Now, I have two GADTs in a codebase I'm working on that look something like this pared down:
data Typ where
PredT :: Typ
ProcT :: [Typ] -> Typ
IntT :: Typ
ListT :: Typ -> Typ
data HKTyp v (f :: * -> * -> *) where
HKPredT :: HKTyp v f
HKProcT :: [HKTyp v f] -> HKTyp v f
HKIntT :: HKTyp v f
HKListT :: f v (HKTyp v f) -> HKTyp v f
I need these types to be different (rather than using the later as a generalization of the former), because the singletons library (or at least the template haskell functions) doesn't like higher-kinded data. Now, because I have to keep these types separate, I want to have some conversion functions between them:
newtype Id v a = Id a
promoteHK :: Typ -> HKTyp v Id
promoteHK PredT = HKPredT
promoteHK (ProcT ts) = HKProcT (fmap promoteHK ts)
promoteHK IntT = HKIntT
promoteHK (ListT x) = HKListT (Id $ promoteHK x)
demoteHK :: HKTyp v Id -> Typ
demoteHK HKPredT = PredT
demoteHK (HKProcT (Id ts)) = ProcT (fmap demoteHK ts)
demoteHK HKIntT = IntT
demoteHK (HKListT (Id x)) = HKListT x
These are pretty mechanical to write, but that's not the issue.
While I'm sure in many cases, GHC could inline and beta-reduce applications of demoteHK
and promoteHK
at compile time, thus not causing any runtime costs for doing these conversions, I really want to be able to write
f :: [Typ] -> [HKTyp v Id]
f = coerce
to avoid traversing the data structure, since these types are so similar, and hence (I assume) should have the same underlying representation in memory.
My question is two-fold.
I haven't tested the performance of the following, and it may be convoluted enough that GHC, in fact, can't optimize it, but it will let us build a better tool. The idea is to use Generics
.
The plan is to define a type class that coerces two types that have the same Generic
structure along with a function that makes use of the class. Consider the following:
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics
class GenericCoerce a b where
genericCoerce' :: a x -> b x
genericCoerce :: (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => x -> y
genericCoerce = to . genericCoerce' . from
Of course, we still need to define what makes two Rep
s coercible, but just like your promoteHK
and demoteHK
definitions, this is somewhat mechanical:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE EmptyCase #-}
instance GenericCoerce V1 V1 where
genericCoerce' = \case
instance GenericCoerce U1 U1 where
genericCoerce' = id
instance (GenericCoerce f f', GenericCoerce g g') => GenericCoerce (f :+: g) (f' :+: g') where
genericCoerce' (L1 x) = L1 (genericCoerce' x)
genericCoerce' (R1 x) = R1 (genericCoerce' x)
instance (GenericCoerce f f', GenericCoerce g g') => GenericCoerce (f :*: g) (f' :*: g') where
genericCoerce' (x :*: y) = genericCoerce' x :*: genericCoerce' y
instance GenericCoerce cs1 cs2 => GenericCoerce (M1 t m cs1) (M1 t m' cs2) where
genericCoerce' (M1 x) = M1 (genericCoerce' x)
instance (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => GenericCoerce (K1 t x) (K1 t y) where
genericCoerce' (K1 x) = K1 (genericCoerce x)
This actually works for very basic cases! Consider a data type like:
data Foo = Bar | Baz
deriving (Generic, Show)
We get the behavior we want:
> genericCoerce @Bool @Foo True
Baz
> genericCoerce @Foo @Bool Bar
False
However, the problem with this Generic
way of doing coercion is that it doesn't exactly play nicely with the normal way of coercing data types. Specifically, the type rep of a given type and the type rep of that type wrapped in a newtype wrapper are not the same.
One possible solution is the use of (gasp) incoherent instances. This may be a line too far for you, but if you're okay with them, consider the following two additional instances:
-- instances to handle newtype constructor
instance {-# INCOHERENT #-} (Generic x, Rep x ~ D1 m x', GenericCoerce x' y) => GenericCoerce (C1 m2 (S1 m3 (Rec0 x))) y where
genericCoerce' = genericCoerce' . unM1 . from . unK1 . unM1 . unM1
instance {-# INCOHERENT #-} (Generic y, Rep y ~ D1 m y', GenericCoerce x y') => GenericCoerce x (C1 m2 (S1 m3 (Rec0 y))) where
genericCoerce' = M1 . M1 . K1 . to . M1 . genericCoerce'
These two instances specifically target the case where one of the types has a newtype wrapper. Incoherent instances are considered dangerous, and if you have a lot of nested types/newtypes in your coercion, perhaps something could go wrong. That said, with these two instance in play, you're good to go with the example you gave:
promoteHK :: Typ -> HKTyp v Id
promoteHK = genericCoerce
demoteHK :: HKTyp v Id -> Typ
demoteHK = genericCoerce
In action:
> promoteHK PredT
HKPredT
> promoteHK (ListT PredT)
HKListT (Id HKPredT)
> promoteHK (ListT (ListT (ListT PredT)))
HKListT (Id (HKListT (Id (HKListT (Id HKPredT)))))
> demoteHK (HKProcT [HKIntT, HKPredT])
ProcT [IntT,PredT]
So far, I haven't quite answered your question. You asked if two seemingly isomorphic types really do have the same memory representation in GHC and if there are any guarantees in GHC that let you do things like this in general (I assume by "things like this", you mean coercions between isomorphic data types).
As far as I'm aware, there are no guarantees from GHC, but genericCoerce
has given us a slightly firmer ground to walk on. Excluding the incoherent instances hack, the original version of genericCoerce
converts a data type with phantom type parameters into the same datatype just with different phantom parameters. Technically, I can provide no guarantee that GHC will store multiple instances of the same runtime data the same way, but it seems to me like a pretty easy assumption to make.
Once we add in the incoherent instances and the newtype wrapper shenanigans, we're standing on less solid grounds, but the fact that it all works is some consolation.
Indeed, now that we see that the core of genericCoerce
really is a coercion (we're building the same data type up from what was just destructed in every case), and if we trust that the newtype-wrapper incoherent instances also function as a coercion, then we can write:
genericCoerceProbablySafe :: (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => x -> y
genericCoerceProbablySafe = unsafeCoerce
We get better performance than genericCoerce
and more type safety than unsafeCoerce
, and we've reduced your questions down to: "Is the Generic
Rep
an accurate proxy for how GHC stores memory (up to newtype wrappers)?"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With