Where and Why should I use extra empty patterns?

Question

For example:

intersectBy : (a -> a -> Bool) -> List a -> List a -> List a
intersectBy _  [] _     =  []
intersectBy _  _  []    =  []
intersectBy eq xs ys    =  [x | x <- xs, any (eq x) ys]

There are extra patterns for [] and seems like they are used in Haskell Data.List but what kind of optimization is that? And where is difference with Idris here?

I ask because I heard that "it will make reasoning about it more difficult" and person who said me that had no time to fully explain it.

I doubt if I can understand it doing "reduce the proof" of function.

May someone explain me the politics of extra patterns here from positions of Haskell and Idris so I will be able to understand and see the difference.

chi · Accepted Answer

Semantically speaking, the pattern

intersectBy _  [] _     =  []

looks redundant, even from a performance point of view. Instead, in Haskell

intersectBy _  _  []    =  []

is not redundant since otherwise

intersectBy (==) [0..] []

would diverge, since the comprehension would attempt to try all the elements x <- [0..].

I am not sure I like this special case, though. Why shouldn't we add a special case covering intersectBy (==) [0..] [2] so that it returns [2]? Further, if performance is the concern, in many cases I'd like to use a O(n log n) approach through pre-sorting, even is this does not work on infinite lists and requires Ord a.

Reid Barton · Answer

There's no need to guess when you can look up the history through git blame, GHC Trac and the libraries mailing list.

Originally the definition was just the third equation,

intersectBy eq xs ys    =  [x | x <- xs, any (eq x) ys]

In https://github.com/ghc/ghc/commit/8e8128252ee5d91a73526479f01ace8ba2537667 the second equation was added as a strictness/performance improvement, and at the same time, the first equation was added so as to make the new definition always at least as defined as the original. Otherwise, intersectBy f [] _|_ would be _|_ when it was [] before.

It seems to me that this current definition is now maximally lazy: it is as defined as possible for any inputs, except that one has to choose whether to check the left or right list for emptiness first. (And, as I mentioned above, this choice is made to be consistent with the historical definition.)

Theodore Lief Gannon · Answer

@chi explains the _ _ [] case, but _ [] _ serves a purpose as well: it dictates how intersectBy handles bottom. With the definition as written:

λ. intersectBy undefined    []     undefined
[]
λ. intersectBy   (==)    undefined    []
*** Exception: Prelude.undefined

Remove the first pattern and it becomes:

λ. intersectBy undefined undefined    []
[]
λ. intersectBy   (==)       []     undefined
*** Exception: Prelude.undefined

I'm not 100% certain of this, but I believe there's also a performance benefit to not binding anything in the first pattern. The final pattern will give the same result for xs == [] without evaluating eq or ys, but AFAIK it still allocates stack space for their thunks.

dfeuer · Answer

There is a big difference in Idris: Idris lists are always finite! Furthermore, Idris is a mostly strict (call-by-value) language, and optionally uses a totality checker, so it's pretty reasonable to assume there won't be any bottoms hiding in the argument lists. The significance of that difference is that the two definitions are much more nearly semantically identical in Idris than in Haskell. The choice of which to use may be made based on the ease of proving properties of the function, or may be based on simplicity.

Where and Why should I use extra empty patterns?

Tags:

pattern-matching

haskell

idris

cnd

4 Answers

chi

Reid Barton

Theodore Lief Gannon

dfeuer

Recent Activity

Donate For Us

Where and Why should I use extra empty patterns?

Tags:

pattern-matching

haskell

idris

cnd

4 Answers

chi

Reid Barton

Theodore Lief Gannon

dfeuer

Related questions

Recent Activity

Donate For Us