While reading the article written by matt might, I saw the following sentance.
experienced programmers become adept at encoding correctness constraints directly into the Haskell type system
Can someone explain the meaning of this sentence or provide a short example?
This can cover a really wide variety of different techniques. The simplest is basically unavoidable: if you want a value that can be null, that can depend on mutable state or user input, you have to mark that with the type system. This is what Maybe
, ST
and IO
do respectively. So if you have something not in one of the three types above, you know it has to be a referentially transparent value that cannot be null.
The techniques above are very fundamental to the language and basically unavoidable. However, there are other ways to use the type system to improve security and correctness that are a bit more interesting.
One useful example is preventing SQL injection. SQL injection is a common problem in web applications--for the basic idea, check out this XKCD cartoon. We can actually use the type system to ensure that any string passed to the database has been sanitized. The basic idea is to create a new type for "raw" strings:
newtype Raw a = Raw a
Then, make sure that all your functions for getting input from the user return Raw
values instead of normal strings. Finally, you just need a sanitization function:
sanitize :: Raw String -> String
Since normal functions accept String
rather than Raw
, you will not be able to accidentally pass in an unsanitized string. And since we defined Raw
using newtype
, it has no runtime overhead at all.
Yesod, one of the main Haskell web frameworks, uses a technique something like this to prevent SQL injection. It also has some other cool approaches like using the type system to prevent broken links within your database; you should check it out.
At the really extreme end, you can even use the type system to ensure that matrices are of the right size. Here's a very simple way to do this. First, we need type-level numbers:
data Z
data S n
(We're using Peano Arithmetic at the type level here.) The idea is simple: Z
is 0 and S
is the successor function, so S Z
is 1, S (S Z)
is 2 and so on.
We can now write a safe matrix multiply function:
matMul :: Mat a b -> Mat b c -> Mat a c
This function will only let you multiply the matrices if the inner dimension matches, and ensures that the resulting matrix has the right dimensions in its type.
Type-safe matrix multiplication has more details.
Well, in my opinion, one of the things that expositions like the one you've seen often fail to convey is that this correctness-by-type stuff is often stuff that doesn't happen "naturally," but rather that comes from using techniques to design your types that are not obvious to programmers who come from other languages. One of my favorite examples comes from the Haskell Wiki page on phantom types; if you look at section 1 on that page, they have this example (which IMO should be a newtype
declaration instead of data
):
data FormData a = FormData String
What's the a
doing? Well, what it does is it artificially makes it so that FormData "foo" :: FormData Validated
and FormData "foo" :: FormData Unvalidated
, despite their "really" being the same, now have incompatible types, and thus you can force your code not to mix one and the other. Well, let me not repeat what the page says, it's relatively easy to read (at least Section 1).
A more complicated example that I've been using in one of my on-and-off projects: OLAP hypercubes can be seen as a kind of array indexed not by integer indexes, but rather by data model objects like persons, days, product lines, etc.:
-- | The type of Hypercubes.
data Hypercube point value = ...
-- | Access a data point in a hypercube.
get :: Eq point => Hypercube point value -> point -> value
-- | This is totally pseudocode...
data Salesperson = Mary | Joe | Irma deriving Eq
data Month = January | February | ... | December deriving Eq
data ProductLine = Widget | Gagdet | Thingamabob
-- Pseudo-example: compute sales numbers grouped by Salesperson, Month and
-- ProductLine for the combinations specified as the "frame"
salesResult :: HyperCube (Salesperson, Month, ProductLine) Dollars
salesResult = execute salesQuery frame
where frame = [Joe, Mary] `by` [March, April] `by` [Widgets, Gadgets]
salesQuery = ...
-- Read from salesResult how much Mary sold in Widgets on April.
example :: Dollars
example = get salesResult (Mary, April, Widgets)
I hope that makes more sense than I fear it does. Anyway, the point the example is the following problem: the type of get
, as laid out here, allows you to ask a Hypercube
to tell you the value of a point it doesn't have:
badExample :: Dollar
badExample = get salesResult (Irma, January, Thingamabob)
One possible solution to this is to make the get
operation return Maybe value
instead of just value
. But we can actually do better; we can design an API where a Hypercube
can only be asked about values that it does contain. The key is similar to the FormData
example, but a more sophisticated variant. First we introduce this phantom type:
data Cell tag point = Cell { getPoint :: point } deriving Eq
Now we reformulate Hypercube
and get
to be tag-sensitive. I'm actually going to make it more concrete in this reformulated example. We start with this:
{-# LANGUAGE ExistentialTypes #-}
data AuxCube tag point value =
AuxCube { getFrame :: [Cell tag point]
, get :: Cell tag point -> value }
-- This is using a type system extension called ExistentialTypes:
data Hypercube point value = forall tag. Hypercube (AuxCube tag point value)
-- How to use one of these cubes. Suppose we have:
salesResult :: Hypercube (Salesperson, Month, ProductLine) Dollars
salesResult = execute salesQuery points
where points = [Joe, Mary] `by` [March, April] `by` [Widgets, Gadgets]
salesQuery = ...
-- Now to read values, we have to do something like this:
example = case salesResult of
Hypercube (AuxCube frame getter) -> getter (head frame)
I apologize if the use of ExistentialTypes
confuses you here, but to make a long story short, what it does in this example is basically that each Hypercube
contains an AuxCube
with a unique anonymous tag type parameter, so that now no two Hypercube
s can have Cell
s of the same type. And for this reason, if we use the module system to prevent callers from constructing Cell
s, callers cannot possibly ask a Hypercube
for a Cell
that it doesn't have a value for.
Credits: I learned this technique by asking here in Stack Overflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With