Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identity function for phantom types

Is it possible to write something like an identity function with phantom types, with the purpose of converting the type?

For example, given the following type definitions

data Nucleotide a = A | C | G | T | U
data RNA = RNA
data DNA = DNA

I would like to write a conversion function like

r2d :: Nucleotide RNA -> Nucleotide DNA
r2d U = T
r2d x = x

This does not type-check however, since the single variable x cannot have a different type on the opposite sides.

Is it possible to write this without having to enumerate through

r2d :: Nucleotide RNA -> Nucleotide DNA
r2d U = T
r2d A = A
r2d C = C
r2d G = G
like image 813
beardc Avatar asked Nov 02 '14 04:11

beardc


1 Answers

TL;DR:
Don't make a datatype where invalid data is possible:
T :: Nucleotide RNA is possible, and that's nonsense biologically, so you might get r2d T (a runtime crash you could have prevented at compile time).

Note that Chris Drost's answer deserves credit for being a good answer to the technical question as asked.


Problem

I noticed a potential source of crashes in that your function r2d is not total - r2d T is not defined, and realised that that was because you have no intention of having T :: Nucleotide RNA (nor U :: Nucleotide DNA). That's a problem because any time you accidentally have (a user-error generated) r2d T your whole program will crash.

This is a design flaw in your type. A major point of the type system is to make invalid data impossible, but yet your code allows T :: Nucleotide RNA and even T :: Nucleotide [Bool].

Straightforward solution

Sadly the solution is to make more boring/less slick types where there's a distinction between DNA's C and RNA's C, but you can use a derived Enum instance to convert them without all the typing.

data DNA = A | C | G | T deriving (Eq, Show, Read, Enum)
data RNA = A' | C' | G' | U' deriving (Eq, Show, Read, Enum)

r2d :: RNA -> DNA
r2d = toEnum.fromEnum

d2r :: DNA -> RNA
d2r = toEnum.fromEnum

toEnum.fromEnum :: (Enum a, Enum b) => a -> b works by converting from the Enum type to Int then from Int to the other Enum type.

Now r2d T is just a type error, so the program won't compile if you allow this, whereas with the phantom type, it'll compile and crash at runtime if the user manages to enter invalid data.

Must we distinguish between RNA's C and DNA's C?

(No....)

You might feel that it's wrong to differentiate between C and C' in that they're the same from a Biological/Chemical point of view, and there may be some compromise position where you have a phantom type with constructors A | C | G | TU and read user data differently depending on the context:

{-# LANGUAGE FlexibleInstances #-}
data Nucleotide a = A | C | G | TU deriving (Eq,Enum)
data RNA = RNA
data DNA = DNA

instance Show (Nucleotide DNA) where
  show A = "A"
  show C = "C"
  show G = "G"
  show TU = "T"

instance Show (Nucleotide RNA) where
  show A = "A"
  show C = "C"
  show G = "G" 
  show TU = "U"

r2d :: (Nucleotide RNA) -> (Nucleotide DNA)
r2d = toEnum.fromEnum

d2r :: (Nucleotide DNA) -> (Nucleotide RNA)
d2r = toEnum.fromEnum

Slick, but...

Sometimes making a complicated type just increases the number of extensions you need to use when if you could tolerate a few 's, you'd have something with fewer potential problems.

It seems to me you'd be better off with my first solution and writing a custom instances for Show RNA and Read RNA where the user doesn't need to put the ' on the end of the letter.

Always avoid runtime errors if you can

Note though, that read is never a total function (i.e. a cause of program crashes), and you're better off using readMay from the safe package so that you can recover gracefully and give your user a polite error message and the chance to fix it, rather than crashing, or by writing a parser using Parsec or similar to read in large amounts of complex structured data, where read or readMay are needlessly slow.

like image 103
AndrewC Avatar answered Oct 08 '22 16:10

AndrewC