Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Motivation behind Phantom Types?

Don Stewart's Haskell in the Large's presentation mentioned Phantom Types:

data Ratio n = Ratio Double 1.234 :: Ratio D3  data Ask ccy = Ask Double Ask 1.5123 :: Ask GBP 

I read over his bullet points about them, but I did not understand them. In addition, I read the Haskell Wiki on the topic. Yet I still am missing their point.

What's the motivation to use a phantom type?

like image 801
Kevin Meredith Avatar asked Jan 31 '15 02:01

Kevin Meredith


People also ask

What are phantom types and when would you use them?

A phantom type is a generic type that is declared but never used inside a type where it is declared. It is usually used as a generic constraint to build a more type-safe and robust API.

What are phantom types Haskell?

From HaskellWiki. A phantom type is a parameterised type whose parameters do not all appear on the right-hand side of its definition, e.g. from Control.Applicative: newtype Const a b = Const { getConst :: a } Here Const is a phantom type, because the b parameter doesn't appear after the = sign.


2 Answers

To answer the "what's the motivation to use a phantom type". There is two points:

  • to make invalid states unrepresentable, which is well explained in Aadit's answer
  • Carry some of the information on the type level

For example you could have distances tagged by the length unit:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}  newtype Distance a = Distance Double   deriving (Num, Show)  data Kilometer data Mile  marathonDistance :: Distance Kilometer marathonDistance = Distance 42.195  distanceKmToMiles :: Distance Kilometer -> Distance Mile distanceKmToMiles (Distance km) = Distance (0.621371 * km)  marathonDistanceInMiles :: Distance Mile marathonDistanceInMiles = distanceKmToMiles marathonDistance 

And you can avoid Mars Climate Orbiter disaster:

>>> marathonDistanceInMiles Distance 26.218749345  >>> marathonDistanceInMiles + marathonDistance  <interactive>:10:27:     Couldn't match type ‘Kilometer’ with ‘Mile’     Expected type: Distance Mile       Actual type: Distance Kilometer     In the second argument of ‘(+)’, namely ‘marathonDistance’     In the expression: marathonDistanceInMiles + marathonDistance 

There are slight varitions to this "pattern". You can use DataKinds to have closed set of units:

{-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE DataKinds #-}  data LengthUnit = Kilometer | Mile  newtype Distance (a :: LengthUnit) = Distance Double   deriving (Num, Show)  marathonDistance :: Distance 'Kilometer marathonDistance = Distance 42.195  distanceKmToMiles :: Distance 'Kilometer -> Distance 'Mile distanceKmToMiles (Distance km) = Distance (0.621371 * km)  marathonDistanceInMiles :: Distance 'Mile marathonDistanceInMiles = distanceKmToMiles marathonDistance 

And it will work similarly:

>>> marathonDistanceInMiles Distance 26.218749345  >>> marathonDistance + marathonDistance Distance 84.39  >>> marathonDistanceInMiles + marathonDistance  <interactive>:28:27:     Couldn't match type ‘'Kilometer’ with ‘'Mile’     Expected type: Distance 'Mile       Actual type: Distance 'Kilometer     In the second argument of ‘(+)’, namely ‘marathonDistance’     In the expression: marathonDistanceInMiles + marathonDistance 

But now the Distance can be only in kilometers or miles, we can't add more units later. That might be useful in some use cases.


We could also do:

data Distance = Distance { distanceUnit :: LengthUnit, distanceValue :: Double }    deriving (Show) 

In the distance case we can work out the addition, for example translate to kilometers if different units are involved. But this doesn't work well for currencies which ratio isn't constant over time etc.


And it's possible to use GADTs for that instead, which may be simpler approach in some situations:

{-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE DataKinds #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE StandaloneDeriving #-}  data Kilometer data Mile  data Distance a where   KilometerDistance :: Double -> Distance Kilometer   MileDistance :: Double -> Distance Mile  deriving instance Show (Distance a)  marathonDistance :: Distance Kilometer marathonDistance = KilometerDistance 42.195  distanceKmToMiles :: Distance Kilometer -> Distance Mile distanceKmToMiles (KilometerDistance km) = MileDistance (0.621371 * km)  marathonDistanceInMiles :: Distance Mile marathonDistanceInMiles = distanceKmToMiles marathonDistance 

Now we know the unit also on the value level:

>>> marathonDistanceInMiles  MileDistance 26.218749345 

This approach especially greately simplifies Expr a example from Aadit's answer:

{-# LANGUAGE GADTs #-}  data Expr a where   Number     :: Int -> Expr Int   Boolean    :: Bool -> Expr Bool   Increment  :: Expr Int -> Expr Int   Not        :: Expr Bool -> Expr Bool 

It's worth pointing out that the latter variations require non-trivial language extensions (GADTs, DataKinds, KindSignatures), which might not be supported in your compiler. That's might be the case with Mu compiler Don mentions.

like image 129
phadej Avatar answered Sep 28 '22 03:09

phadej


The motivation behind using phantom types is to specialize the return type of data constructors. For example, consider:

data List a = Nil | Cons a (List a) 

The return type of both Nil and Cons is List a by default (which is generalized for all lists of type a).

Nil  ::                List a Cons :: a -> List a -> List a                        |____|                           |                     -- return type is generalized 

Also note that Nil is a phantom constructor (i.e. its return type doesn't depend upon its arguments, vacuously in this case, but nonetheless the same).

Because Nil is a phantom constructor we can specialize Nil to any type we want (e.g. Nil :: List Int or Nil :: List Char).


Normal algebraic data types in Haskell allow you to choose the type of the arguments of a data constructor. For example, we chose the type of arguments for Cons above (a and List a).

However, it doesn't allow you to choose the return type of a data constructor. The return type is always generalized. This is fine for most cases. However, there are exceptions. For example:

data Expr a = Number     Int             | Boolean    Bool             | Increment (Expr Int)             | Not       (Expr Bool) 

The type of the data constructors are:

Number    :: Int       -> Expr a Boolean   :: Bool      -> Expr a Increment :: Expr Int  -> Expr a Not       :: Expr Bool -> Expr a 

As you can see, the return type of all the data constructors are generalized. This is problematic because we know that Number and Increment must always return an Expr Int and Boolean and Not must always return an Expr Bool.

The return types of the data constructors are wrong because they are too general. For example, Number cannot possibly return an Expr a but yet it does. This allows you to write wrong expressions which the type checker won't catch. For example:

Increment (Boolean False) -- you shouldn't be able to increment a boolean Not       (Number  0)     -- you shouldn't be able to negate a number 

The problem is that we can't specify the return type of data constructors.


Notice that all the data constructors of Expr are phantom constructors (i.e. their return type doesn't depend upon their arguments). A data type whose constructors are all phantom constructors is called a phantom type.

Remember that the return type of phantom constructors like Nil can be specialized to any type we want. Hence, we can create smart constructors for Expr as follows:

number    :: Int       -> Expr Int boolean   :: Bool      -> Expr Bool increment :: Expr Int  -> Expr Int not       :: Expr Bool -> Expr Bool  number    = Number boolean   = Boolean increment = Increment not       = Not 

Now we can use the smart constructors instead of the normal constructors and our problem is solved:

increment (boolean False) -- error not       (number  0)     -- error 

So phantom constructors are useful when you want to specialize the return type of a data constructor and phantom types are data types whose constructors are all phantom constructors.


Note that data constructors like Left and Right are also phantom constructors:

data Either a b = Left a | Right b  Left  :: a -> Either a b Right :: b -> Either a b 

The reason is that although the return type of these data constructors do depend upon their arguments yet they are still generalized because they only partially depend upon their arguments.

Simple way to know if a data constructor is a phantom constructor:

Do all the type variables appearing in the return type of the data constructor also appear in the arguments of the data constructor? If yes, it's not a phantom constructor.

Hope that helps.

like image 37
Aadit M Shah Avatar answered Sep 28 '22 04:09

Aadit M Shah