Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding correctness constraints directly into the Haskell type system?

Tags:

haskell

While reading the article written by matt might, I saw the following sentance.

experienced programmers become adept at encoding correctness constraints directly into the Haskell type system

Can someone explain the meaning of this sentence or provide a short example?

like image 906
junhee Avatar asked Apr 25 '13 05:04

junhee


2 Answers

This can cover a really wide variety of different techniques. The simplest is basically unavoidable: if you want a value that can be null, that can depend on mutable state or user input, you have to mark that with the type system. This is what Maybe, ST and IO do respectively. So if you have something not in one of the three types above, you know it has to be a referentially transparent value that cannot be null.

The techniques above are very fundamental to the language and basically unavoidable. However, there are other ways to use the type system to improve security and correctness that are a bit more interesting.

One useful example is preventing SQL injection. SQL injection is a common problem in web applications--for the basic idea, check out this XKCD cartoon. We can actually use the type system to ensure that any string passed to the database has been sanitized. The basic idea is to create a new type for "raw" strings:

newtype Raw a = Raw a

Then, make sure that all your functions for getting input from the user return Raw values instead of normal strings. Finally, you just need a sanitization function:

sanitize :: Raw String -> String

Since normal functions accept String rather than Raw, you will not be able to accidentally pass in an unsanitized string. And since we defined Raw using newtype, it has no runtime overhead at all.

Yesod, one of the main Haskell web frameworks, uses a technique something like this to prevent SQL injection. It also has some other cool approaches like using the type system to prevent broken links within your database; you should check it out.

At the really extreme end, you can even use the type system to ensure that matrices are of the right size. Here's a very simple way to do this. First, we need type-level numbers:

data Z
data S n

(We're using Peano Arithmetic at the type level here.) The idea is simple: Z is 0 and S is the successor function, so S Z is 1, S (S Z) is 2 and so on.

We can now write a safe matrix multiply function:

matMul :: Mat a b -> Mat b c -> Mat a c

This function will only let you multiply the matrices if the inner dimension matches, and ensures that the resulting matrix has the right dimensions in its type.

Type-safe matrix multiplication has more details.

like image 115
Tikhon Jelvis Avatar answered Oct 23 '22 18:10

Tikhon Jelvis


Well, in my opinion, one of the things that expositions like the one you've seen often fail to convey is that this correctness-by-type stuff is often stuff that doesn't happen "naturally," but rather that comes from using techniques to design your types that are not obvious to programmers who come from other languages. One of my favorite examples comes from the Haskell Wiki page on phantom types; if you look at section 1 on that page, they have this example (which IMO should be a newtype declaration instead of data):

data FormData a = FormData String

What's the a doing? Well, what it does is it artificially makes it so that FormData "foo" :: FormData Validated and FormData "foo" :: FormData Unvalidated, despite their "really" being the same, now have incompatible types, and thus you can force your code not to mix one and the other. Well, let me not repeat what the page says, it's relatively easy to read (at least Section 1).

A more complicated example that I've been using in one of my on-and-off projects: OLAP hypercubes can be seen as a kind of array indexed not by integer indexes, but rather by data model objects like persons, days, product lines, etc.:

-- | The type of Hypercubes.
data Hypercube point value = ...

-- | Access a data point in a hypercube.
get :: Eq point => Hypercube point value -> point -> value

-- | This is totally pseudocode...
data Salesperson = Mary | Joe | Irma deriving Eq
data Month = January | February | ... | December deriving Eq
data ProductLine = Widget | Gagdet | Thingamabob

-- Pseudo-example: compute sales numbers grouped by Salesperson, Month and
-- ProductLine for the combinations specified as the "frame"
salesResult :: HyperCube (Salesperson, Month, ProductLine) Dollars 
salesResult = execute salesQuery frame
    where frame = [Joe, Mary] `by` [March, April] `by` [Widgets, Gadgets]
          salesQuery = ...

-- Read from salesResult how much Mary sold in Widgets on April.
example :: Dollars
example = get salesResult (Mary, April, Widgets)

I hope that makes more sense than I fear it does. Anyway, the point the example is the following problem: the type of get, as laid out here, allows you to ask a Hypercube to tell you the value of a point it doesn't have:

badExample :: Dollar
badExample = get salesResult (Irma, January, Thingamabob)

One possible solution to this is to make the get operation return Maybe value instead of just value. But we can actually do better; we can design an API where a Hypercube can only be asked about values that it does contain. The key is similar to the FormData example, but a more sophisticated variant. First we introduce this phantom type:

data Cell tag point = Cell { getPoint :: point } deriving Eq

Now we reformulate Hypercube and get to be tag-sensitive. I'm actually going to make it more concrete in this reformulated example. We start with this:

{-# LANGUAGE ExistentialTypes #-}

data AuxCube tag point value = 
    AuxCube { getFrame :: [Cell tag point]
            , get :: Cell tag point -> value }

-- This is using a type system extension called ExistentialTypes:
data Hypercube point value = forall tag. Hypercube (AuxCube tag point value)

-- How to use one of these cubes.  Suppose we have:
salesResult :: Hypercube (Salesperson, Month, ProductLine) Dollars 
salesResult = execute salesQuery points
    where points = [Joe, Mary] `by` [March, April] `by` [Widgets, Gadgets]
          salesQuery = ...

-- Now to read values, we have to do something like this:
example = case salesResult of
              Hypercube (AuxCube frame getter) -> getter (head frame)

I apologize if the use of ExistentialTypes confuses you here, but to make a long story short, what it does in this example is basically that each Hypercube contains an AuxCube with a unique anonymous tag type parameter, so that now no two Hypercubes can have Cells of the same type. And for this reason, if we use the module system to prevent callers from constructing Cells, callers cannot possibly ask a Hypercube for a Cell that it doesn't have a value for.

Credits: I learned this technique by asking here in Stack Overflow.

like image 7
Luis Casillas Avatar answered Oct 23 '22 18:10

Luis Casillas