I'm not sure how to word this question. Say I'm trying to pass the paths of tmpfiles around, and I want to capture the idea that there are different formats of tmpfile, and each function only works on one of them. This works:
data FileFormat
  = Spreadsheet
  | Picture
  | Video
  deriving Show
data TmpFile = TmpFile FileFormat FilePath
  deriving Show
videoPath :: TmpFile -> FilePath
videoPath (TmpFile Video p) = p
videoPath _ = error "only works on videos!"
But there must be a better way to write it without runtime errors right? I thought of two alternatives, this:
type TmpSpreadsheet = TmpFile Spreadsheet
type TmpPicture     = TmpFile Picture
type TmpVideo       = TmpFile Video
videoPath :: TmpVideo -> FilePath
Or this:
data TmpFile a = TmpFile a FilePath
  deriving Show
videoPath :: TmpFile Video -> FilePath
But obviously they don't compile. What's the proper way to do it? Some other ideas, none particularly appealing:
TmpFile in the format instead of the other way around, so the values are Video (TmpFile "test.avi") etc.VideoTmpFile, PictureTmpFile etc.TmpFile typeclassI also considered learning the -XDataKinds extension, but suspect I'm missing something much simpler that can be done without it.
EDIT: I'm learning a lot today! I tried both the approaches outlined below (DataKinds and phantom types, which have dummy value constructors that can be removed with another extension), and they both work! Then I tried to go a little further. They both let you make a nested type TmpFile (ListOf a) in addition to regular TmpFile a, which is cool. But I've tentatively decided to go with plain phantom types (intact value constructors), because you can pattern match on them. For example, I was surprised that this actually works:
data Spreadsheet = Spreadsheet deriving Show
data Picture     = Picture     deriving Show
data Video       = Video       deriving Show
data ListOf a    = ListOf a    deriving Show
data TmpFile a = TmpFile a FilePath
  deriving Show
videoPath :: TmpFile Video -> FilePath
videoPath (TmpFile Video p) = p
-- read a file that contains a list of filenames of type a,
-- and return them as individual typed tmpfiles
listFiles :: TmpFile (ListOf a) -> IO [TmpFile a]
listFiles (TmpFile (ListOf fmt) path) = do
  txt <- readFile path
  let paths = map (TmpFile fmt) (lines txt)
  return paths
vidPath :: TmpFile Video
vidPath = TmpFile Video "video1.txt"
-- $ cat videos.txt
-- video1.avi
-- video2.avi
vidsList :: TmpFile (ListOf Video)
vidsList = TmpFile (ListOf Video) "videos.txt"
main :: IO [FilePath]
main = do
  paths <- listFiles vidsList  -- [TmpFile Video "video1.avi",TmpFile Video "video2.avi"]
  return $ map videoPath paths -- ["video1.avi","video2.avi"]
As far as I can tell, the equivalent with DataKinds is very similar, but can't access fmt as a value:
{-# LANGUAGE DataKinds, KindSignatures #-}
data FileFormat
  = Spreadsheet
  | Picture
  | Video
  | ListOf FileFormat
  deriving Show
data TmpFile (a :: FileFormat) = TmpFile FilePath
  deriving Show
vidPath :: TmpFile Video
vidPath = TmpFile "video.avi"
vidsList :: TmpFile (ListOf Video)
vidsList = TmpFile "videos.txt"
videoPath :: TmpFile Video -> FilePath
videoPath (TmpFile p) = p
listFiles :: TmpFile (ListOf a) -> IO [TmpFile a]
listFiles (TmpFile path) = do
  txt <- readFile path
  let paths = map TmpFile (lines txt)
  return paths
main :: IO [FilePath]
main = do
  paths <- listFiles vidsList
  return $ map videoPath paths
(It may seem like a weird thing to want, but my actual program is going to be an interpreter for a small language that compiles to Shake rules with a tmpfile corresponding to each variable, so typed lists of tmpfiles will be useful)
Does that seem right? I like the idea of DataKinds better, so I would go with it instead if I could inspect them as values, or if it turns out that's never needed.
In a data declaration, a type constructor is the thing on the left hand side of the equals sign. The data constructor(s) are the things on the right hand side of the equals sign. You use type constructors where a type is expected, and you use data constructors where a value is expected.
() is very often used as the result of something that has no interesting result. For example, an IO action that is supposed to perform some I/O and terminate without producing a result will typically have type IO () .
A composite data type is constructed from other types. The most common composite data types in Haskell are lists and tuples.
IO is a type constructor, not a value constructor. IO True would be a type, not a value (if True was a type).
You're right: with -XDataKinds, the TmpFile Video -> FilePath approach would work. And indeed I think this may be a good application for that extension.
{-# LANGUAGE DataKinds #-}
data TmpFile (a :: FileFormat) = TmpFile FilePath
  deriving Show
videoPath :: TmpFile Video -> FilePath
The reason you need this extension to write TmpFile Video is that the constructors of FileFormat are ab initio value-level (thus only exist at runtime), while TmpFile is type-level / compile-time.
Of course there's another way to generate type-level entities: define types!
data Spreadsheet = Spreadsheet
data Picture = Picture
data Video = Video
data TmpFile a = TmpFile a FilePath
  deriving Show
videoPath :: TmpFile Video -> FilePath
Such types are called phantom types. But really, they're a bit of a hack to work around the former lack of proper type-level values, which DataKinds has now given us. So, unless you need compatibility with old compilers, do use DataKinds!
An alternative would be to not enforce the file type at compile time, but simply make it explicit that the functions are partial.
data TmpFile = TmpFile FileFormat FilePath
  deriving Show
videoPath :: TmpFile -> Maybe FilePath
videoPath (TmpFile Video p) = p
videoPath _ = Nothing
In fact, that approach might well be the more rational one, depending on what you're planning to do.
First of all, I would advice against using such exotic extensions as "DataKinds" unless you absolutely need them. The reason is quite practical and general: the more language concepts you use to solve your problem, the harder it is to reason about your code.
Besides, "DataKinds" isn't an easy concept to wrap a head around. It is a transitional concept crossing two universes simultaneously: the values and the types. Personally I find it quite controversial and would only apply it when I have no other option.
In your case you've already found two ways of approaching your problem simpler, without "DataKinds":
Wrap TmpFile in the format instead of the other way around, so the values are Video (TmpFile "test.avi") etc.
Make lots of separate data types VideoTmpFile, PictureTmpFile etc.
I particularly like the idea of the wrapping types, because it is flexible and composable. Here's how I'd build up on it:
newtype Video a =
  Video a
  deriving (Functor, Foldable, Traversable)
newtype Picture a =
  Picture a
  deriving (Functor, Foldable, Traversable)
videoPath :: Video FilePath -> FilePath
You can notice two things:
Video and Picture are general concepts, which are not bound to just your temporary files, and they already implement some standard interfaces. This means that they can be reused for other purposes.
There is an obvious pattern in the definitions of Video and Picture.
The pattern that you see in Video and Picture can be called "refinement types" and is abstracted from in the "refined" package among others. So you might be interested in that.
As for your other options:
Make a TmpFile typeclass
Use partial functions everywhere, but add guard functions to abstract the pattern matching
That's a definite "No" to both. Don't breed typeclasses, leave them be for the truely general concepts, which have laws and likely a (Category) theory behind them. The language provides you plentiful other ways to abstract. Also don't let partial functions crawl out to your APIs - there's a consensus in the community that it's an antipattern.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With