Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for type checking ML-like pattern matching?

How do you determine whether a given pattern is "good", specifically whether it is exhaustive and non-overlapping, for ML-style programming languages?

Suppose you have patterns like:

match lst with   x :: y :: [] -> ...   [] -> ... 

or:

match lst with   x :: xs -> ...   x :: [] -> ...   [] -> ... 

A good type checker would warn that the first is not exhaustive and the second is overlapping. How would the type checker make those kinds of decisions in general, for arbitrary data types?

like image 944
Tommy McGuire Avatar asked Oct 24 '11 23:10

Tommy McGuire


People also ask

What is pattern matching explain with algorithm and example?

What Does Pattern Matching Mean? Pattern matching in computer science is the checking and locating of specific sequences of data of some pattern among raw data or a sequence of tokens. Unlike pattern recognition, the match has to be exact in the case of pattern matching.

What is pattern matching in ML?

ML is an aspect which learns from the data without explicitly programmed, which may be iterative in nature and becomes accurate as it keeps performing tasks. ML is a form of pattern recognition which is basically the idea of training machines to recognize patterns and apply them to practical problems.

What is type checking in Haskell?

Haskell uses a system of static type checking. This means that every expression in Haskell is assigned a type.

What is pattern matching in Erlang?

Pattern matching occurs when evaluating a function call, case- receive- try- expressions and match operator (=) expressions. In a pattern matching, a left-hand side pattern is matched against a right-hand side term. If the matching succeeds, any unbound variables in the pattern become bound.


1 Answers

Here's a sketch of an algorithm. It's also the basis of Lennart Augustsson's celebrated technique for compiling pattern matching efficiently. (The paper is in that incredible FPCA proceedings (LNCS 201) with oh so many hits.) The idea is to reconstruct an exhaustive, non-redundant analysis by repeatedly splitting the most general pattern into constructor cases.

In general, the problem is that your program has a possibly empty bunch of ‘actual’ patterns {p1, .., pn}, and you want to know if they cover a given ‘ideal’ pattern q. To kick off, take q to be a variable x. The invariant, initially satisfied and subsequently maintained, is that each pi is σiq for some substitution σi mapping variables to patterns.

How to proceed. If n=0, the bunch is empty, so you have a possible case q that isn't covered by a pattern. Complain that the ps are not exhaustive. If σ1 is an injective renaming of variables, then p1 catches every case that matches q, so we're warm: if n=1, we win; if n>1 then oops, there's no way p2 can ever be needed. Otherwise, we have that for some variable x, σ1x is a constructor pattern. In that case split the problem into multiple subproblems, one for each constructor cj of x's type. That is, split the original q into multiple ideal patterns qj = [x:=cj y1 .. yarity(cj)]q, and refine the patterns accordingly for each qj to maintain the invariant, dropping those that don't match.

Let's take the example with {[], x :: y :: zs} (using :: for cons). We start with

  xs covering  {[], x :: y :: zs} 

and we have [xs := []] making the first pattern an instance of the ideal. So we split xs, getting

  [] covering {[]}   x :: ys covering {x :: y :: zs} 

The first of these is justified by the empty injective renaming, so is ok. The second takes [x := x, ys := y :: zs], so we're away again, splitting ys, getting.

  x :: [] covering {}   x :: y :: zs covering {x :: y :: zs} 

and we can see from the first subproblem that we're banjaxed.

The overlap case is more subtle and allows for variations, depending on whether you want to flag up any overlap, or just patterns which are completely redundant in a top-to-bottom priority order. Your basic rock'n'roll is the same. E.g., start with

  xs covering {[], ys} 

with [xs := []] justifying the first of those, so split. Note that we have to refine ys with constructor cases to maintain the invariant.

  [] covering {[], []}   x :: xs covering {y :: ys} 

Clearly, the first case is strictly an overlap. On the other hand, when we notice that refining an actual program pattern is needed to maintain the invariant, we can filter out those strict refinements that become redundant and check that at least one survives (as happens in the :: case here).

So, the algorithm builds a set of ideal exhaustive overlapping patterns q in a way that's motivated by the actual program patterns p. You split the ideal patterns into constructor cases whenever the actual patterns demand more detail of a particular variable. If you're lucky, each actual pattern is covered by disjoint nonempty sets of ideal patterns and each ideal pattern is covered by just one actual pattern. The tree of case splits which yield the ideal patterns gives you the efficient jump-table driven compilation of the actual patterns.

The algorithm I've presented is clearly terminating, but if there are datatypes with no constructors, it can fail to accept that the empty set of patterns is exhaustive. This is a serious issue in dependently typed languages, where exhaustiveness of conventional patterns is undecidable: the sensible approach is to allow "refutations" as well as equations. In Agda, you can write (), pronounced "my Aunt Fanny", in any place where no constructor refinement is possible, and that absolves you from the requirement to complete the equation with a return value. Every exhaustive set of patterns can be made recognizably exhaustive by adding in enough refutations.

Anyhow, that's the basic picture.

like image 81
pigworker Avatar answered Sep 23 '22 00:09

pigworker