Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different types in case expression result in Haskell

I'm trying to implement some kind of message parser in Haskell, so I decided to use types for message types, not constructors:

data DebugMsg  = DebugMsg String
data UpdateMsg = UpdateMsg [String]

.. and so on. I belive it is more useful to me, because I can define typeclass, say, Msg for message with all information/parsers/actions related to this message. But I have problem here. When I try to write parsing function using case:

parseMsg :: (Msg a) => Int -> Get a
parseMsg code = 
    case code of
        1 -> (parse :: Get DebugMsg)
        2 -> (parse :: Get UpdateMsg)

..type of case result should be same in all branches. Is there any solution? And does it even possible specifiy only typeclass for function result and expect it to be fully polymorphic?

like image 619
ikkeps Avatar asked Jan 31 '13 03:01

ikkeps


People also ask

What is a case expression in Haskell?

The case expression in Haskell Many imperative languages have Switch case syntax: we take a variable and execute blocks of code for specific values of that variable. We might also include a catch-all block of code in case the variable has some value for which we didn't set up a case.

How types are declared in Haskell?

Haskell has three basic ways to declare a new type: The data declaration, which defines new data types. The type declaration for type synonyms, that is, alternative names for existing types. The newtype declaration, which defines new data types equivalent to existing ones.

Can Haskell lists have different types?

Haskell also incorporates polymorphic types---types that are universally quantified in some way over all types. Polymorphic type expressions essentially describe families of types. For example, (forall a)[a] is the family of types consisting of, for every type a, the type of lists of a.


2 Answers

Yes, all the right hand sides of all your subcases must have the exact same type; and this type must be the same as the type of the whole case expression. This is a feature; it's required for the language to be able to guarantee at compilation time that there cannot be any type errors at runtime.

Some of the comments on your question mention that the simplest solution is to use a sum (a.k.a. variant) type:

data ParserMsg = DebugMsg String | UpdateMsg [String]

A consequence of this is that the set of alternative results is defined ahead of time. This is sometimes an upside (your code can be certain that there are no unhandled subcases), sometimes a downside (there is a finite number of subcases and they are determined at compilation time).

A more advanced solution in some cases—which you might not need, but I'll just throw it in—is to refactor the code to use functions as data. The idea is that you create a datatype that has functions (or monadic actions) as its fields, and then different behaviors = different functions as record fields.

Compare these two styles with this example. First, specifying different cases as a sum (this uses GADTs, but should be simple enough to understand):

{-# LANGUAGE GADTs #-}

import Data.Vector (Vector, (!))
import qualified Data.Vector as V

type Size = Int    
type Index = Int

-- | A 'Frame' translates between a set of values and consecutive array 
-- indexes.  (Note: this simplified implementation doesn't handle duplicate
-- values.)
data Frame p where 
    -- | A 'SimpleFrame' is backed by just a 'Vector'
    SimpleFrame  :: Vector p -> Frame p
    -- | A 'ProductFrame' is a pair of 'Frame's.
    ProductFrame :: Frame p -> Frame q -> Frame (p, q)

getSize :: Frame p -> Size
getSize (SimpleFrame v) = V.length v
getSize (ProductFrame f g) = getSize f * getSize g

getIndex :: Frame p -> Index -> p
getIndex (SimpleFrame v) i = v!i
getIndex (ProductFrame f g) ij = 
    let (i, j) = splitIndex (getSize f, getSize g) ij
    in (getIndex f i, getIndex g j)

pointIndex :: Eq p => Frame p -> p -> Maybe Index
pointIndex (SimpleFrame v) p = V.elemIndex v p
pointIndex (ProductFrame f g) (p, q) = 
    joinIndexes (getSize f, getSize g) (pointIndex f p) (pointIndex g q)

joinIndexes :: (Size, Size) -> Index -> Index -> Index
joinIndexes (_, rsize) i j = i * rsize + j

splitIndex :: (Size, Size) -> Index -> (Index, Index)
splitIndex (_, rsize) ij = (ij `div` rsize, ij `mod` rsize)

In this first example, a Frame can only ever be either a SimpleFrame or a ProductFrame, and every Frame function must be defined to handle both cases.

Second, datatype with function members (I elide code common to both examples):

data Frame p = Frame { getSize    :: Size
                     , getIndex   :: Index -> p
                     , pointIndex :: p -> Maybe Index }

simpleFrame :: Eq p => Vector p -> Frame p
simpleFrame v = Frame (V.length v) (v!) (V.elemIndex v)

productFrame :: Frame p -> Frame q -> Frame (p, q)
productFrame f g = Frame newSize getI pointI
    where newSize = getSize f * getSize g
          getI ij = let (i, j) = splitIndex (getSize f, getSize g) ij 
                    in (getIndex f i, getIndex g j)
          pointI (p, q) = joinIndexes (getSize f, getSize g) 
                                      (pointIndex f p) 
                                      (pointIndex g q)

Here the Frame type takes the getIndex and pointIndex operations as data members of the Frame itself. There isn't a fixed compile-time set of subcases, because the behavior of a Frame is determined by its element functions, which are supplied at runtime. So without having to touch those definitions, we could add:

import Control.Applicative ((<|>))

concatFrame :: Frame p -> Frame p -> Frame p
concatFrame f g = Frame newSize getI pointI
    where newSize = getSize f + getSize g
          getI ij | ij < getSize f = ij
                  | otherwise      = ij - getSize f
          pointI p = getPoint f p <|> fmap (+(getSize f)) (getPoint g p)

I call this second style "behavioral types," but that really is just me.

Note that type classes in GHC are implemented similarly to this—there is a hidden "dictionary" argument passed around, and this dictionary is a record whose members are implementations for the class methods:

data ShowDictionary a { primitiveShow :: a -> String }

stringShowDictionary :: ShowDictionary String
stringShowDictionary = ShowDictionary { primitiveShow = ... }

-- show "whatever"
-- ---> primitiveShow stringShowDictionary "whatever"
like image 96
Luis Casillas Avatar answered Nov 07 '22 16:11

Luis Casillas


You could accomplish something like this with existential types, however it wouldn't work how you want it to, so you really shouldn't.

Doing it with normal polymorphism, as you have in your example, won't work at all. What your type says is that the function is valid for all a--that is, the caller gets to choose what kind of message to receive. However, you have to choose the message based on the numeric code, so this clearly won't do.

To clarify: all standard Haskell type variables are universally quantified by default. You can read your type signature as ∀a. Msg a => Int -> Get a. What this says is that the function is define for every value of a, regardless of what the argument may be. This means that it has to be able to return whatever particular a the caller wants, regardless of what argument it gets.

What you really want is something like ∃a. Msg a => Int -> Get a. This is why I said you could do it with existential types. However, this is relatively complicated in Haskell (you can't quite write a type signature like that) and will not actually solve your problem correctly; it's just something to keep in mind for the future.

Fundamentally, using classes and types like this is not very idiomatic in Haskell, because that's not what classes are meant to do. You would be much better off sticking to a normal algebraic data type for your messages.

I would have a single type like this:

data Message = DebugMsg String
             | UpdateMsg [String]

So instead of having a parse function per type, just do the parsing in the parseMsg function as appropriate:

parseMsg :: Int -> String -> Message
parseMsg n msg = case n of
  1 -> DebugMsg msg
  2 -> UpdateMsg [msg]

(Obviously fill in whatever logic you actually have there.)

Essentially, this is the classical use for normal algebraic data types. There is no reason to have different types for the different kinds of messages, and life is much easier if they have the same type.

It looks like you're trying to emulate sub-typing from other languages. As a rule of thumb, you use algebraic data types in place of most of the uses of sub-types in other languages. This is certainly one of those cases.

like image 39
Tikhon Jelvis Avatar answered Nov 07 '22 17:11

Tikhon Jelvis