Is it possible to write something like an identity function with phantom types, with the purpose of converting the type?
For example, given the following type definitions
data Nucleotide a = A | C | G | T | U
data RNA = RNA
data DNA = DNA
I would like to write a conversion function like
r2d :: Nucleotide RNA -> Nucleotide DNA
r2d U = T
r2d x = x
This does not type-check however, since the single variable x
cannot have a different type on the opposite sides.
Is it possible to write this without having to enumerate through
r2d :: Nucleotide RNA -> Nucleotide DNA
r2d U = T
r2d A = A
r2d C = C
r2d G = G
TL;DR:
Don't make a datatype where invalid data is possible:T :: Nucleotide RNA
is possible, and that's nonsense biologically, so you might get r2d T
(a runtime crash you could have prevented at compile time).
Note that Chris Drost's answer deserves credit for being a good answer to the technical question as asked.
I noticed a potential source of crashes in that your function r2d
is not total - r2d T
is not defined, and realised that that was because you have no intention of having T :: Nucleotide RNA
(nor U :: Nucleotide DNA
). That's a problem because any time you accidentally have (a user-error generated) r2d T
your whole program will crash.
This is a design flaw in your type. A major point of the type system is to make invalid data impossible, but yet your code allows T :: Nucleotide RNA
and even T :: Nucleotide [Bool]
.
Sadly the solution is to make more boring/less slick types where there's a distinction between DNA's C and RNA's C, but you can use a derived Enum
instance to convert them without all the typing.
data DNA = A | C | G | T deriving (Eq, Show, Read, Enum)
data RNA = A' | C' | G' | U' deriving (Eq, Show, Read, Enum)
r2d :: RNA -> DNA
r2d = toEnum.fromEnum
d2r :: DNA -> RNA
d2r = toEnum.fromEnum
toEnum.fromEnum :: (Enum a, Enum b) => a -> b
works by converting from the Enum type to Int
then from Int
to the other Enum type.
Now r2d T
is just a type error, so the program won't compile if you allow this, whereas with the phantom type, it'll compile and crash at runtime if the user manages to enter invalid data.
(No....)
You might feel that it's wrong to differentiate between C
and C'
in that they're the same from a Biological/Chemical point of view, and there may be some compromise position where you have a phantom type with constructors A | C | G | TU
and read user data differently depending on the context:
{-# LANGUAGE FlexibleInstances #-}
data Nucleotide a = A | C | G | TU deriving (Eq,Enum)
data RNA = RNA
data DNA = DNA
instance Show (Nucleotide DNA) where
show A = "A"
show C = "C"
show G = "G"
show TU = "T"
instance Show (Nucleotide RNA) where
show A = "A"
show C = "C"
show G = "G"
show TU = "U"
r2d :: (Nucleotide RNA) -> (Nucleotide DNA)
r2d = toEnum.fromEnum
d2r :: (Nucleotide DNA) -> (Nucleotide RNA)
d2r = toEnum.fromEnum
Sometimes making a complicated type just increases the number of extensions you need to use when if you could tolerate a few '
s, you'd have something with fewer potential problems.
It seems to me you'd be better off with my first solution and writing a custom instances for Show RNA
and Read RNA
where the user doesn't need to put the '
on the end of the letter.
Note though, that read
is never a total function (i.e. a cause of program crashes), and you're better off using readMay
from the safe
package so that you can recover gracefully and give your user a polite error message and the chance to fix it, rather than crashing, or by writing a parser using Parsec or similar to read in large amounts of complex structured data, where read or readMay are needlessly slow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With