For example:
intersectBy : (a -> a -> Bool) -> List a -> List a -> List a
intersectBy _ [] _ = []
intersectBy _ _ [] = []
intersectBy eq xs ys = [x | x <- xs, any (eq x) ys]
There are extra patterns for []
and seems like they are used in Haskell Data.List
but what kind of optimization is that? And where is difference with Idris here?
I ask because I heard that "it will make reasoning about it more difficult" and person who said me that had no time to fully explain it.
I doubt if I can understand it doing "reduce the proof" of function.
May someone explain me the politics of extra patterns here from positions of Haskell and Idris so I will be able to understand and see the difference.
Semantically speaking, the pattern
intersectBy _ [] _ = []
looks redundant, even from a performance point of view. Instead, in Haskell
intersectBy _ _ [] = []
is not redundant since otherwise
intersectBy (==) [0..] []
would diverge, since the comprehension would attempt to try all the elements x <- [0..]
.
I am not sure I like this special case, though. Why shouldn't we add a special case covering intersectBy (==) [0..] [2]
so that it returns [2]
? Further, if performance is the concern, in many cases I'd like to use a O(n log n) approach through pre-sorting, even is this does not work on infinite lists and requires Ord a
.
There's no need to guess when you can look up the history through git blame
, GHC Trac and the libraries mailing list.
Originally the definition was just the third equation,
intersectBy eq xs ys = [x | x <- xs, any (eq x) ys]
In https://github.com/ghc/ghc/commit/8e8128252ee5d91a73526479f01ace8ba2537667 the second equation was added as a strictness/performance improvement, and at the same time, the first equation was added so as to make the new definition always at least as defined as the original. Otherwise, intersectBy f [] _|_
would be _|_
when it was []
before.
It seems to me that this current definition is now maximally lazy: it is as defined as possible for any inputs, except that one has to choose whether to check the left or right list for emptiness first. (And, as I mentioned above, this choice is made to be consistent with the historical definition.)
@chi explains the _ _ []
case, but _ [] _
serves a purpose as well: it dictates how intersectBy
handles bottom
. With the definition as written:
λ. intersectBy undefined [] undefined
[]
λ. intersectBy (==) undefined []
*** Exception: Prelude.undefined
Remove the first pattern and it becomes:
λ. intersectBy undefined undefined []
[]
λ. intersectBy (==) [] undefined
*** Exception: Prelude.undefined
I'm not 100% certain of this, but I believe there's also a performance benefit to not binding anything in the first pattern. The final pattern will give the same result for xs == []
without evaluating eq
or ys
, but AFAIK it still allocates stack space for their thunks.
There is a big difference in Idris: Idris lists are always finite! Furthermore, Idris is a mostly strict (call-by-value) language, and optionally uses a totality checker, so it's pretty reasonable to assume there won't be any bottoms hiding in the argument lists. The significance of that difference is that the two definitions are much more nearly semantically identical in Idris than in Haskell. The choice of which to use may be made based on the ease of proving properties of the function, or may be based on simplicity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With