What/why are the differences between those three? Is a GADT (and regular data types) just a shorthand for a data family? Specifically what's the difference between:
data GADT a where
MkGADT :: Int -> GADT Int
data family FGADT a
data instance FGADT a where -- note not FGADT Int
MkFGADT :: Int -> FGADT Int
data family DF a
data instance DF Int where -- using GADT syntax, but not a GADT
MkDF :: Int -> DF Int
(Are those examples over-simplified, so I'm not seeing the subtleties of the differences?)
Data families are extensible, but GADTs are not. OTOH data family instances must not overlap. So I couldn't declare another instance/any other constructors for FGADT
; just like I can't declare any other constructors for GADT
. I can declare other instances for DF
.
With pattern matching on those constructors, the rhs of the equation does 'know' that the payload is Int
.
For class instances (I was surprised to find) I can write overlapping instances to consume GADTs:
instance C (GADT a) ...
instance {-# OVERLAPPING #-} C (GADT Int) ...
and similarly for (FGADT a), (FGADT Int)
. But not for (DF a)
: it must be for (DF Int)
-- that makes sense; there's no data instance DF a
, and if there were it would overlap.
ADDIT: to clarify @kabuhr's answer (thank you)
contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference
These types are tricky, so I expect I'd need explicit signatures to work with them. In that case the plain data family is easiest
inferDF (MkDF x) = x -- works without a signature
The inferred type inferDF :: DF Int -> Int
makes sense. Giving it a signature inferDF :: DF a -> a
doesn't make sense: there is no declaration for a data instance DF a ...
. Similarly with foodouble :: Foo Char a -> a
there is no data instance Foo Char a ...
.
GADTs are awkward, I already know. So neither of these work without an explicit signature
inferGADT (MkGADT x) = x
inferFGADT (MkFGADT x) = x
Mysterious "untouchable" message, as you say. What I meant in my "matching on those constructors" comment was: the compiler 'knows' on rhs of an equation that the payload is Int
(for all three constructors), so you'd better get any signatures consistent with that.
Then I'm thinking data GADT a where ...
is as if data instance GADT a where ...
. I can give a signature inferGADT :: GADT a -> a
or inferGADT :: GADT Int -> Int
(likewise for inferFGADT
). That makes sense: there is a data instance GADT a ...
or I can give a signature at a more specific type.
So in some ways data families are generalisations of GADTs. I also see as you say
So, in some ways, GADTs are generalizations of data families.
Hmm. (The reason behind the question is that GHC Haskell has got to the stage of feature bloat: there's too many similar-but-different extensions. I was trying to prune it down to a smaller number of underlying abstractions. Then @HTNW's approach of explaining in terms of yet further extensions is opposite to what would help a learner. IMO existentials in data types should be chucked out: use GADTs instead. PatternSynonyms should be explained in terms of data types and mapping functions between them, not the other way round. Oh, and there's some DataKinds stuff, which I skipped over on first reading.)
As a start, you should think of a data family as a collection of independent ADTs that happen to be indexed by a type, while a GADT is a single data type with an inferrable type parameter where constraints on that parameter (typically, equality constraints like a ~ Int
) can be brought into scope by pattern matching.
This means that the biggest difference is that, contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference on the type parameter. In particular, this typechecks:
inferGADT :: GADT a -> a
inferGADT (MkGADT n) = n
but this does not:
inferDF :: DF a -> a
inferDF (MkDF n) = n
and without type signatures, the first would fail to type check (with a mysterious "untouchable" message) while the second would be inferred as DF Int -> Int
.
The situation becomes quite a bit more confusing for something like your FGADT
type that combines data families with GADTs, and I confess I haven't really thought about how this works in detail. But, as an interesting example, consider:
data family Foo a b
data instance Foo Int a where
Bar :: Double -> Foo Int Double
Baz :: String -> Foo Int String
data instance Foo Char Double where
Quux :: Double -> Foo Char Double
data instance Foo Char String where
Zlorf :: String -> Foo Char String
In this case, Foo Int a
is a GADT with an inferrable a
parameter:
fooint :: Foo Int a -> a
fooint (Bar x) = x + 1.0
fooint (Baz x) = x ++ "ish"
but Foo Char a
is just a collection of separate ADTs, so this won't typecheck:
foodouble :: Foo Char a -> a
foodouble (Quux x) = x
for the same reason inferDF
won't typecheck above.
Now, getting back to your plain DF
and GADT
types, you can largely emulate DFs
just using GADTs
. For example, if you have a DF:
data family MyDF a
data instance MyDF Int where
IntLit :: Int -> MyDF Int
IntAdd :: MyDF Int -> MyDF Int -> MyDF Int
data instance MyDF Bool where
Positive :: MyDF Int -> MyDF Bool
you can write it as a GADT just by writing separate blocks of constructors:
data MyGADT a where
-- MyGADT Int
IntLit' :: Int -> MyGADT Int
IntAdd' :: MyGADT Int -> MyGADT Int -> MyGADT Int
-- MyGADT Bool
Positive' :: MyGADT Int -> MyGADT Bool
So, in some ways, GADTs are generalizations of data families. However, a major use case for data families is defining associated data types for classes:
class MyClass a where
data family MyRep a
instance MyClass Int where
data instance MyRep Int = ...
instance MyClass String where
data instance MyRep String = ...
where the "open" nature of data families is needed (and where the pattern-based inference methods of GADTs aren't helpful).
I think the difference becomes clear if we use PatternSynonyms
-style type signatures for data constructors. Lets start with Haskell 98
data D a = D a a
You get a pattern type:
pattern D :: forall a. a -> a -> D a
it can be read in two directions. D
, in "forward" or expression contexts, says, "forall a
, you can give me 2 a
s and I'll give you a D a
". "Backwards", as a pattern, it says, "forall a
, you can give me a D a
and I'll give you 2 a
s".
Now, the things you write in a GADT definition are not pattern types. What are they? Lies. Lies lies lies. Give them attention only insofar as the alternative is writing them out manually with ExistentialQuantification
. Let's use this one
data GD a where
GD :: Int -> GD Int
You get
-- vv ignore
pattern GD :: forall a. () => (a ~ Int) => Int -> GD a
This says: forall a
, you can give me a GD a
, and I can give you a proof that a ~ Int
, plus an Int
.
Important observation: The return/match type of a GADT constructor is always the "data type head". I defined data GD a where ...
; I got GD :: forall a. ... GD a
. This is also true for Haskell 98 constructors, and also data family
constructors, though it's a bit more subtle.
If I have a GD a
, and I don't know what a
is, I can pass into GD
anyway, even though I wrote GD :: Int -> GD Int
, which seems to say I can only match it with GD Int
s. This is why I say GADT constructors lie. The pattern type never lies. It clearly states that, forall a
, I can match a GD a
with the GD
constructor and get evidence for a ~ Int
and a value of Int
.
Ok, data family
s. Lets not mix them with GADTs yet.
data Nat = Z | S Nat
data Vect (n :: Nat) (a :: Type) :: Type where
VNil :: Vect Z a
VCons :: a -> Vect n a -> Vect (S n) a -- try finding the pattern types for these btw
data family Rect (ns :: [Nat]) (a :: Type) :: Type
newtype instance Rect '[] a = RectNil a
newtype instance Rect (n : ns) a = RectCons (Vect n (Rect ns a))
There are actually two data type heads now. As @K.A.Buhr says, the different data instance
s act like different data types that just happen to share a name. The pattern types are
pattern RectNil :: forall a. a -> Rect '[] a
pattern RectCons :: forall n ns a. Vect n (Rect ns a) -> Rect (n : ns) a
If I have a Rect ns a
, and I don't know what ns
is, I cannot match on it. RectNil
only takes Rect '[] a
s, RectCons
only takes Rect (n : ns) a
s. You might ask: "why would I want a reduction in power?" @KABuhr has given one: GADTs are closed (and for good reason; stay tuned), families are open. This doesn't hold in Rect
's case, as these instances already fill up the entire [Nat] * Type
space. The reason is actually newtype
.
Here's a GADT RectG
:
data RectG :: [Nat] -> Type -> Type where
RectGN :: a -> RectG '[] a
RectGC :: Vect n (RectG ns a) -> RectG (n : ns) a
I get
-- it's fine if you don't get these
pattern RectGN :: forall ns a. () => (ns ~ '[]) => a -> RectG ns a
pattern RectGC :: forall ns' a. forall n ns. (ns' ~ (n : ns)) =>
Vect n (RectG ns a) -> RectG ns' a
-- just note that they both have the same matched type
-- which means there needs to be a runtime distinguishment
If I have a RectG ns a
and don't know what ns
is, I can still match on it just fine. The compiler has to preserve this information with a data constructor. So, if I had a RectG [1000, 1000] Int
, I would incur an overhead of one million RectGN
constructors that all "preserve" the same "information". Rect [1000, 1000] Int
is fine, though, as I do not have the ability to match and tell whether a Rect
is RectNil
or RectCons
. This allows the constructor to be newtype
, as it holds no information. I would instead use a different GADT, somewhat like
data SingListNat :: [Nat] -> Type where
SLNN :: SingListNat '[]
SLNCZ :: SingListNat ns -> SingListNat (Z : ns)
SLNCS :: SingListNat (n : ns) -> SingListNat (S n : ns)
that stores the dimensions of a Rect
in O(sum ns)
space instead of O(product ns)
space (I think those are right). This is also why GADT
s are closed and families are open. A GADT is just like a normal data
type except it has equality evidence and existentials. It doesn't make sense to add constructors to a GADT any more than it makes sense to add constructors to a Haskell 98 type, because any code that doesn't know about one of the constructors is in for a very bad time. It's fine for families though, because, as you noticed, once you define a branch of a family, you cannot add more constructors in that branch. Once you know what branch you're in, you know the constructors, and no one can break that. You're not allowed to use any constructors if you don't know which branch to use.
Your examples don't really mix GADTs and data families. Pattern types are nifty in that they normalize away superficial differences in data
definitions, so let's take a look.
data family FGADT a
data instance FGADT a where
MkFGADT :: Int -> FGADT Int
Gives you
pattern MkFGADT :: forall a. () => (a ~ Int) => Int -> FGADT a
-- no different from a GADT; data family does nothing
But
data family DF a
data instance DF Int where
MkDF :: Int -> DF Int
gives
pattern MkDF :: Int -> DF Int
-- GADT syntax did nothing
Here's a proper mixing
data family Map k :: Type -> Type
data instance Map Word8 :: Type -> Type where
MW8BitSet :: BitSet8 -> Map Word8 Bool
MW8General :: GeneralMap Word8 a -> Map Word8 a
Which gives patterns
pattern MW8BitSet :: forall a. () => (a ~ Bool) => BitSet8 -> Map Word8 a
pattern MW8General :: forall a. GeneralMap Word8 a -> Map Word8 a
If I have a Map k v
and I don't know what k
is, I can't match it against MW8General
or MW8BitSet
, because those only want Map Word8
s. This is the data family
's influence. If I have a Map Word8 v
and I don't know what v
is, matching on the constructors can reveal to me whether it's known to be Bool
or is something else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With