I was wondering how pattern matching in Haskell works. I am aware of this thread, but could not quite understand the answers therein.
(==)
and Eq
is implemented by use of pattern matching.Can you tell me why match
and match3
are working even if I omit deriving (Eq)
in the following code snippet, (it is clear why match2
is failing)?
data MyType = TypeA | TypeB
deriving (Eq)
match :: MyType -> String
match TypeA = "this is type A"
match TypeB = "this is type B"
match2 :: MyType -> String
match2 a | a == TypeA = "this is type A matched by equality"
| a == TypeB = "this is type B matched by equality"
| otherwise = "this is neither type A nor type B"
match3 :: MyType -> String
match3 a = case a of TypeA -> "this is type A matched by case expression"
TypeB -> "this is type B matched by case expression"
main :: IO ()
main = do
(print . match) TypeA
(print . match) TypeB
(print . match2) TypeA
(print . match2) TypeB
(print . match3) TypeA
(print . match3) TypeB
Pattern matching consists of specifying patterns to which some data should conform and then checking to see if it does and deconstructing the data according to those patterns. When defining functions, you can define separate function bodies for different patterns.
We use pattern matching in Haskell to simplify our codes by identifying specific types of expression. We can also use if-else as an alternative to pattern matching. Pattern matching can also be seen as a kind of dynamic polymorphism where, based on the parameter list, different methods can be executed.
Pattern matching is used to determine whether source files of high-level languages are syntactically correct. It is also used to find and replace a matching pattern in a text or code with another text/code. Any application that supports search functionality uses pattern matching in one way or another.
x:xs is simply another notation for lists in Haskell, where x represents the head of the list and xs (pronounced exes) represents the tail. So, again for the list [1, 2, 3], x would be 1, whilst xs would be [2, 3].
I just want to point out that data types and pattern matching (to a first approximation) are merely useful but redundant language features, that can be implemented purely using lambda calculus. If you understand how to implement them in lambda calculus, you can understand why they don't need Eq
to implement pattern matching.
Implementing data types in lambda calculus is known as "Church-encoding" them (after Alonzo Church, who demonstrated how to do this). Church-encoded functions are also known as "Continuation-passing style".
It's called "continuation-passing style" because instead of providing the value, you provide a function that will process the value. So for example, instead of giving a user an Int
, I could instead give them a value of the following type:
type IndirectInt = forall x . (Int -> x) -> x
The above type is "isomorphic" to an Int
. "Isomorphic" is just a fancy way of saying that we can convert any IndirectInt
into an Int
:
fw :: IndirectInt -> Int
fw indirect = indirect id
... and we can convert any Int
into an IndirectInt
:
bw :: Int -> IndirectInt
bw int = \f -> f int
... such that:
fw . bw = id -- Exercise: Prove this
bw . fw = id -- Exercise: Prove this
Using continuation-passing style, we can convert any data type into a lambda-calculus term. Let's start with a simple type like:
data Either a b = Left a | Right b
In continuation-passing style, this would become:
type IndirectEither a b = forall x . (Either a b -> x) -> x
But Alonzo Church was smart and noticed that for any type with multiple constructors, we can just provide a separate function for each constructor. So in the case of the above type, instead of providing a function of type (Either a b -> x)
, we can instead provide two separate functions, one for the a
, and one for the b
, and that would be just as good:
type IndirectEither a b = forall x . (a -> x) -> (b -> x) -> x
-- Exercise: Prove that this definition is isomorphic to the previous one
What about a type like Bool
where the constructors have no arguments? Well, Bool
is isomorphic to Either () ()
(Exercise: Prove this), so we can just encode it as:
type IndirectBool = forall x . (() -> x) -> (() -> x) -> x
And () -> x
is just isomorphic to x
(Exercise: Prove this), so we can further rewrite it as:
type IndirectBool = forall x . x -> x -> x
There are only two functions that can have the above type:
true :: a -> a -> a
true a _ = a
false :: a -> a -> a
false _ a = a
Because of the isomorphism, we can guarantee that all Church encodings will have as many implementations as there were possible values of the original data type. So it's no coincidence that there are exactly two functions that inhabit IndirectBool
, just like there are exactly two constructors for Bool
.
But how do we pattern-match on our IndirectBool
? For example, with an ordinary Bool
, we could just write:
expression1 :: a
expression2 :: a
case someBool of
True -> expression1
False -> expression2
Well, with our IndirectBool
it already comes with the tools to deconstruct itself. We would just write:
myIndirectBool expression1 expression2
Notice that if myIndirectBool
is true
, it will pick the first expression, and if it is false
it will pick the second expression, just as if we had somehow pattern-matched on its value.
Let's try to do the same thing with an IndirectEither
. Using an ordinary Either
, we'd write:
f :: a -> c
g :: b -> c
case someEither of
Left a -> f a
Right b -> g b
With the IndirectEither
, we'd just write:
someIndirectEither f g
In short, when we write types in continuation-passing style, the continuations are like the case statements of a case construct, so all we are doing is passing each different case statement as arguments to the function.
This is the reason you don't need any sense of Eq
to pattern-match on a type. In lambda calculus, the type decides what it is "equal" to, simply by defining which argument it selects out of the ones provided to it.
So if I'm a true
, I prove I am "equal" to true
by always selecting my first argument. If I'm a false
, I prove I am "equal" to false
by always selecting my second argument. In short, constructor "equality" boils down to "positional equality", which is always defined in a lambda calculus, and if we can distinguish one parameter as the "first" and another as the "second", that's all we need to have the ability to "compare" constructors.
First of all, match
and match3
are actually exactly the same (ignoring the different strings): pattern matching in functions is desugared to a case statement.
Next, pattern matching works on constructors rather than arbitrary values. Pattern matching is built into the language and does not depend on boolean expressions--it just acts on the constructors directly. This is most evident with more complex matches that include some matchable terms:
f :: MyType -> Int
f (A a) = a + 1
f (B a b) = a + b
How would you rewrite these patterns into boolean expressions? You can't (without knowing anything else about MyType
).
Instead, the pattern matching just goes by constructor. Each pattern has to be headed by a constructor--you can't write a pattern like f (a b c)
in Haskell. Then, when the function gets a value, it can look at the value's constructor and jump to the appropriate cases immediately. This is the way it has to work for more complicated patterns (like A a
), and is also the way it works for the simple patterns you used.
Since pattern matching just works by going to the appropriate constructor, it does not depend on Eq
at all. Not only do you not have to have an Eq
instance to pattern match, but having one also does not change how pattern matching behaves. For example, try this:
data MyType = TypeA | TypeB | TypeC
instance Eq MyType where
TypeA == TypeA = True
TypeB == TypeC = True
TypeC == TypeB = True
_ == _ = False
match :: MyType → String
match TypeA = "this is type A"
match TypeB = "this is type B"
match TypeC = "this is type C"
match2 :: MyType → String
match2 a | a == TypeA = "this is type A matched by equality"
| a == TypeC = "this is type B matched by equality"
| a == TypeB = "this is type C matched by equality"
| otherwise = "this is neither type A nor type B"
Now you've defined equality such that TypeB
is equal to TypeC
but not to itself. (In real life, you should ensure that equality behaves reasonably and follows the reflexive property, but this is a toy example.) Now, if you use pattern matching, TypeB
still matches TypeB
and TypeC
matches TypeC
. But if you use your guard expressions, TypeB
actually matches TypeC
and TypeC
matches TypeB
. TypeA
is unchanged between the two.
Moreover, note how the Eq
instance was defined using pattern matching. When you use a deriving
clause, it gets defined in a similar way with code generated at compile time. So pattern matching is more fundamental than Eq
--it is part of the language where Eq
is just part of the standard library.
In summary: pattern matching is built into the language and works by comparing the constructor and then recursively matching on the rest of the value. Equality is usually implemented in terms of pattern matching and compares the whole value rather than just the constructor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With