I'm trying to implement some kind of message parser in Haskell, so I decided to use types for message types, not constructors:
data DebugMsg = DebugMsg String
data UpdateMsg = UpdateMsg [String]
.. and so on. I belive it is more useful to me, because I can define typeclass, say, Msg
for message with all information/parsers/actions related to this message.
But I have problem here. When I try to write parsing function using case
:
parseMsg :: (Msg a) => Int -> Get a
parseMsg code =
case code of
1 -> (parse :: Get DebugMsg)
2 -> (parse :: Get UpdateMsg)
..type of case result should be same in all branches. Is there any solution? And does it even possible specifiy only typeclass for function result and expect it to be fully polymorphic?
The case expression in Haskell Many imperative languages have Switch case syntax: we take a variable and execute blocks of code for specific values of that variable. We might also include a catch-all block of code in case the variable has some value for which we didn't set up a case.
Haskell has three basic ways to declare a new type: The data declaration, which defines new data types. The type declaration for type synonyms, that is, alternative names for existing types. The newtype declaration, which defines new data types equivalent to existing ones.
Haskell also incorporates polymorphic types---types that are universally quantified in some way over all types. Polymorphic type expressions essentially describe families of types. For example, (forall a)[a] is the family of types consisting of, for every type a, the type of lists of a.
Yes, all the right hand sides of all your subcases must have the exact same type; and this type must be the same as the type of the whole case
expression. This is a feature; it's required for the language to be able to guarantee at compilation time that there cannot be any type errors at runtime.
Some of the comments on your question mention that the simplest solution is to use a sum (a.k.a. variant) type:
data ParserMsg = DebugMsg String | UpdateMsg [String]
A consequence of this is that the set of alternative results is defined ahead of time. This is sometimes an upside (your code can be certain that there are no unhandled subcases), sometimes a downside (there is a finite number of subcases and they are determined at compilation time).
A more advanced solution in some cases—which you might not need, but I'll just throw it in—is to refactor the code to use functions as data. The idea is that you create a datatype that has functions (or monadic actions) as its fields, and then different behaviors = different functions as record fields.
Compare these two styles with this example. First, specifying different cases as a sum (this uses GADTs, but should be simple enough to understand):
{-# LANGUAGE GADTs #-}
import Data.Vector (Vector, (!))
import qualified Data.Vector as V
type Size = Int
type Index = Int
-- | A 'Frame' translates between a set of values and consecutive array
-- indexes. (Note: this simplified implementation doesn't handle duplicate
-- values.)
data Frame p where
-- | A 'SimpleFrame' is backed by just a 'Vector'
SimpleFrame :: Vector p -> Frame p
-- | A 'ProductFrame' is a pair of 'Frame's.
ProductFrame :: Frame p -> Frame q -> Frame (p, q)
getSize :: Frame p -> Size
getSize (SimpleFrame v) = V.length v
getSize (ProductFrame f g) = getSize f * getSize g
getIndex :: Frame p -> Index -> p
getIndex (SimpleFrame v) i = v!i
getIndex (ProductFrame f g) ij =
let (i, j) = splitIndex (getSize f, getSize g) ij
in (getIndex f i, getIndex g j)
pointIndex :: Eq p => Frame p -> p -> Maybe Index
pointIndex (SimpleFrame v) p = V.elemIndex v p
pointIndex (ProductFrame f g) (p, q) =
joinIndexes (getSize f, getSize g) (pointIndex f p) (pointIndex g q)
joinIndexes :: (Size, Size) -> Index -> Index -> Index
joinIndexes (_, rsize) i j = i * rsize + j
splitIndex :: (Size, Size) -> Index -> (Index, Index)
splitIndex (_, rsize) ij = (ij `div` rsize, ij `mod` rsize)
In this first example, a Frame
can only ever be either a SimpleFrame
or a ProductFrame
, and every Frame
function must be defined to handle both cases.
Second, datatype with function members (I elide code common to both examples):
data Frame p = Frame { getSize :: Size
, getIndex :: Index -> p
, pointIndex :: p -> Maybe Index }
simpleFrame :: Eq p => Vector p -> Frame p
simpleFrame v = Frame (V.length v) (v!) (V.elemIndex v)
productFrame :: Frame p -> Frame q -> Frame (p, q)
productFrame f g = Frame newSize getI pointI
where newSize = getSize f * getSize g
getI ij = let (i, j) = splitIndex (getSize f, getSize g) ij
in (getIndex f i, getIndex g j)
pointI (p, q) = joinIndexes (getSize f, getSize g)
(pointIndex f p)
(pointIndex g q)
Here the Frame
type takes the getIndex
and pointIndex
operations as data members of the Frame
itself. There isn't a fixed compile-time set of subcases, because the behavior of a Frame
is determined by its element functions, which are supplied at runtime. So without having to touch those definitions, we could add:
import Control.Applicative ((<|>))
concatFrame :: Frame p -> Frame p -> Frame p
concatFrame f g = Frame newSize getI pointI
where newSize = getSize f + getSize g
getI ij | ij < getSize f = ij
| otherwise = ij - getSize f
pointI p = getPoint f p <|> fmap (+(getSize f)) (getPoint g p)
I call this second style "behavioral types," but that really is just me.
Note that type classes in GHC are implemented similarly to this—there is a hidden "dictionary" argument passed around, and this dictionary is a record whose members are implementations for the class methods:
data ShowDictionary a { primitiveShow :: a -> String }
stringShowDictionary :: ShowDictionary String
stringShowDictionary = ShowDictionary { primitiveShow = ... }
-- show "whatever"
-- ---> primitiveShow stringShowDictionary "whatever"
You could accomplish something like this with existential types, however it wouldn't work how you want it to, so you really shouldn't.
Doing it with normal polymorphism, as you have in your example, won't work at all. What your type says is that the function is valid for all a
--that is, the caller gets to choose what kind of message to receive. However, you have to choose the message based on the numeric code, so this clearly won't do.
To clarify: all standard Haskell type variables are universally quantified by default. You can read your type signature as ∀a. Msg a => Int -> Get a
. What this says is that the function is define for every value of a
, regardless of what the argument may be. This means that it has to be able to return whatever particular a
the caller wants, regardless of what argument it gets.
What you really want is something like ∃a. Msg a => Int -> Get a
. This is why I said you could do it with existential types. However, this is relatively complicated in Haskell (you can't quite write a type signature like that) and will not actually solve your problem correctly; it's just something to keep in mind for the future.
Fundamentally, using classes and types like this is not very idiomatic in Haskell, because that's not what classes are meant to do. You would be much better off sticking to a normal algebraic data type for your messages.
I would have a single type like this:
data Message = DebugMsg String
| UpdateMsg [String]
So instead of having a parse
function per type, just do the parsing in the parseMsg
function as appropriate:
parseMsg :: Int -> String -> Message
parseMsg n msg = case n of
1 -> DebugMsg msg
2 -> UpdateMsg [msg]
(Obviously fill in whatever logic you actually have there.)
Essentially, this is the classical use for normal algebraic data types. There is no reason to have different types for the different kinds of messages, and life is much easier if they have the same type.
It looks like you're trying to emulate sub-typing from other languages. As a rule of thumb, you use algebraic data types in place of most of the uses of sub-types in other languages. This is certainly one of those cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With