How do you determine whether a given pattern is "good", specifically whether it is exhaustive and non-overlapping, for ML-style programming languages? Suppose you have patterns like: <pre class="prettyprint"><code>match lst with x :: y :: [] -> ... [] -> ... </code></pre> or: <pre class="prettyprint"><code>match lst with x :: xs -> ... x :: [] -> ... [] -> ... </code></pre> A good type checker would warn that the first is not exhaustive and the second is overlapping. How would the type checker make those kinds of decisions in general, for arbitrary data types?

Here's a sketch of an algorithm. It's also the basis of Lennart Augustsson's celebrated technique for compiling pattern matching efficiently. (The paper is in that incredible FPCA proceedings (LNCS 201) with oh so many hits.) The idea is to reconstruct an exhaustive, non-redundant analysis by repeatedly splitting the most general pattern into constructor cases. In general, the problem is that your program has a possibly empty bunch of ‘actual’ patterns {p1, .., pn}, and you want to know if they cover a given ‘ideal’ pattern q. To kick off, take q to be a variable x. The invariant, initially satisfied and subsequently maintained, is that each pi is σiq for some substitution σi mapping variables to patterns. How to proceed. If n=0, the bunch is empty, so you have a possible case q that isn't covered by a pattern. Complain that the ps are not exhaustive. If σ1 is an injective renaming of variables, then p1 catches every case that matches q, so we're warm: if n=1, we win; if n>1 then oops, there's no way p2 can ever be needed. Otherwise, we have that for some variable x, σ1x is a constructor pattern. In that case split the problem into multiple subproblems, one for each constructor cj of x's type. That is, split the original q into multiple ideal patterns qj = [x:=cj y1 .. yarity(cj)]q, and refine the patterns accordingly for each qj to maintain the invariant, dropping those that don't match. Let's take the example with <code>{[], x :: y :: zs}</code> (using <code>::</code> for <code>cons</code>). We start with <pre class="prettyprint"><code> xs covering {[], x :: y :: zs} </code></pre> and we have [xs := []] making the first pattern an instance of the ideal. So we split xs, getting <pre class="prettyprint"><code> [] covering {[]} x :: ys covering {x :: y :: zs} </code></pre> The first of these is justified by the empty injective renaming, so is ok. The second takes [x := x, ys := y :: zs], so we're away again, splitting ys, getting. <pre class="prettyprint"><code> x :: [] covering {} x :: y :: zs covering {x :: y :: zs} </code></pre> and we can see from the first subproblem that we're banjaxed. The overlap case is more subtle and allows for variations, depending on whether you want to flag up any overlap, or just patterns which are completely redundant in a top-to-bottom priority order. Your basic rock'n'roll is the same. E.g., start with <pre class="prettyprint"><code> xs covering {[], ys} </code></pre> with [xs := []] justifying the first of those, so split. Note that we have to refine ys with constructor cases to maintain the invariant. <pre class="prettyprint"><code> [] covering {[], []} x :: xs covering {y :: ys} </code></pre> Clearly, the first case is strictly an overlap. On the other hand, when we notice that refining an actual program pattern is needed to maintain the invariant, we can filter out those strict refinements that become redundant and check that at least one survives (as happens in the <code>::</code> case here). So, the algorithm builds a set of ideal exhaustive overlapping patterns q in a way that's motivated by the actual program patterns p. You split the ideal patterns into constructor cases whenever the actual patterns demand more detail of a particular variable. If you're lucky, each actual pattern is covered by disjoint nonempty sets of ideal patterns and each ideal pattern is covered by just one actual pattern. The tree of case splits which yield the ideal patterns gives you the efficient jump-table driven compilation of the actual patterns. The algorithm I've presented is clearly terminating, but if there are datatypes with no constructors, it can fail to accept that the empty set of patterns is exhaustive. This is a serious issue in dependently typed languages, where exhaustiveness of conventional patterns is undecidable: the sensible approach is to allow "refutations" as well as equations. In Agda, you can write (), pronounced "my Aunt Fanny", in any place where no constructor refinement is possible, and that absolves you from the requirement to complete the equation with a return value. Every exhaustive set of patterns can be made recognizably exhaustive by adding in enough refutations. Anyhow, that's the basic picture.

Algorithm for type checking ML-like pattern matching?

Tags:

algorithm

haskell

typechecking

ocaml

How do you determine whether a given pattern is "good", specifically whether it is exhaustive and non-overlapping, for ML-style programming languages?

Suppose you have patterns like:

match lst with   x :: y :: [] -> ...   [] -> ...

or:

match lst with   x :: xs -> ...   x :: [] -> ...   [] -> ...

A good type checker would warn that the first is not exhaustive and the second is overlapping. How would the type checker make those kinds of decisions in general, for arbitrary data types?

944

asked Oct 24 '11 23:10

Tommy McGuire

1 Answers

Here's a sketch of an algorithm. It's also the basis of Lennart Augustsson's celebrated technique for compiling pattern matching efficiently. (The paper is in that incredible FPCA proceedings (LNCS 201) with oh so many hits.) The idea is to reconstruct an exhaustive, non-redundant analysis by repeatedly splitting the most general pattern into constructor cases.

In general, the problem is that your program has a possibly empty bunch of ‘actual’ patterns {p₁, .., p_n}, and you want to know if they cover a given ‘ideal’ pattern q. To kick off, take q to be a variable x. The invariant, initially satisfied and subsequently maintained, is that each p_i is σ_iq for some substitution σ_i mapping variables to patterns.

How to proceed. If n=0, the bunch is empty, so you have a possible case q that isn't covered by a pattern. Complain that the ps are not exhaustive. If σ₁ is an injective renaming of variables, then p₁ catches every case that matches q, so we're warm: if n=1, we win; if n>1 then oops, there's no way p₂ can ever be needed. Otherwise, we have that for some variable x, σ₁x is a constructor pattern. In that case split the problem into multiple subproblems, one for each constructor c_j of x's type. That is, split the original q into multiple ideal patterns q_j = [x:=c_j y₁ .. y_{arity(c_j)}]q, and refine the patterns accordingly for each q_j to maintain the invariant, dropping those that don't match.

Let's take the example with {[], x :: y :: zs} (using :: for cons). We start with

  xs covering  {[], x :: y :: zs}

and we have [xs := []] making the first pattern an instance of the ideal. So we split xs, getting

  [] covering {[]}   x :: ys covering {x :: y :: zs}

The first of these is justified by the empty injective renaming, so is ok. The second takes [x := x, ys := y :: zs], so we're away again, splitting ys, getting.

  x :: [] covering {}   x :: y :: zs covering {x :: y :: zs}

and we can see from the first subproblem that we're banjaxed.

The overlap case is more subtle and allows for variations, depending on whether you want to flag up any overlap, or just patterns which are completely redundant in a top-to-bottom priority order. Your basic rock'n'roll is the same. E.g., start with

  xs covering {[], ys}

with [xs := []] justifying the first of those, so split. Note that we have to refine ys with constructor cases to maintain the invariant.

  [] covering {[], []}   x :: xs covering {y :: ys}

Clearly, the first case is strictly an overlap. On the other hand, when we notice that refining an actual program pattern is needed to maintain the invariant, we can filter out those strict refinements that become redundant and check that at least one survives (as happens in the :: case here).

So, the algorithm builds a set of ideal exhaustive overlapping patterns q in a way that's motivated by the actual program patterns p. You split the ideal patterns into constructor cases whenever the actual patterns demand more detail of a particular variable. If you're lucky, each actual pattern is covered by disjoint nonempty sets of ideal patterns and each ideal pattern is covered by just one actual pattern. The tree of case splits which yield the ideal patterns gives you the efficient jump-table driven compilation of the actual patterns.

The algorithm I've presented is clearly terminating, but if there are datatypes with no constructors, it can fail to accept that the empty set of patterns is exhaustive. This is a serious issue in dependently typed languages, where exhaustiveness of conventional patterns is undecidable: the sensible approach is to allow "refutations" as well as equations. In Agda, you can write (), pronounced "my Aunt Fanny", in any place where no constructor refinement is possible, and that absolves you from the requirement to complete the equation with a return value. Every exhaustive set of patterns can be made recognizably exhaustive by adding in enough refutations.

Anyhow, that's the basic picture.

answered Sep 23 '22 00:09

pigworker

Related questions
                            
                                What is the difference between O, Ω, and Θ?
                            
                                find pair of numbers in array that add to given sum
                            
                                Making Fibonacci faster [duplicate]
                            
                                Printing BFS (Binary Tree) in Level Order with Specific Formatting
                            
                                how to efficiently get the k bigger elements of a list in python
                            
                                Best word wrap algorithm? [closed]
                            
                                median of three values strategy
                            
                                How to find the element of an array that is repeated at least N/2 times?
                            
                                circle-circle collision
                            
                                Packing arbitrary triangles into a finite box?
                            
                                How did Firefox optimize this loop?
                            
                                Fastest way to sort 32bit signed integer arrays in JavaScript?
                            
                                What's a good, generic algorithm for collapsing a set of potentially-overlapping ranges?
                            
                                When to use Rabin-Karp or KMP algorithms?
                            
                                Algorithm for maze generation with no dead ends?
                            
                                How is it possible to build a suffix tree in linear time?
                            
                                Practical Uses of Fractals in Programming
                            
                                Most efficient algorithm for merging sorted IEnumerable<T>
                            
                                Finding all the shortest paths between two nodes in unweighted undirected graph
                            
                                finding a^b^c^... mod m

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With