Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell: Why ++ is not allowed in pattern matching?

Suppose we want to write our own sum function in Haskell:

sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs

Why can't we do something like:

sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (xs++[x]) = x + sum' xs

In other words why can't we use ++ in pattern matching ?

like image 488
Abhisek Avatar asked Feb 26 '18 16:02

Abhisek


3 Answers

This is a deserving question, and it has so far received sensible answers (mutter only constructors allowed, mutter injectivity, mutter ambiguity), but there's still time to change all that.

We can say what the rules are, but most of the explanations for why the rules are what they are start by over-generalising the question, addressing why we can't pattern match against any old function (mutter Prolog). This is to ignore the fact that ++ isn't any old function: it's a (spatially) linear plugging-stuff-together function, induced by the zipper-structure of lists. Pattern matching is about taking stuff apart, and indeed, notating the process in terms of the plugger-togetherers and pattern variables standing for the components. Its motivation is clarity. So I'd like

lookup :: Eq k => k -> [(k, v)] -> Maybe v
lookup k (_ ++ [(k, v)] ++ _) = Just v
lookup _ _                    = Nothing

and not only because it would remind me of the fun I had thirty years ago when I implemented a functional language whose pattern matching offered exactly that.

The objection that it's ambiguous is a legitimate one, but not a dealbreaker. Plugger-togetherers like ++ offer only finitely many decompositions of finite input (and if you're working on infinite data, that's your own lookout), so what's involved is at worst search, rather than magic (inventing arbitrary inputs that arbitrary functions might have thrown away). Search calls for some means of prioritisation, but so do our ordered matching rules. Search can also result in failure, but so, again, can matching.

We have a sensible way to manage computations offering alternatives (failure and choice) via the Alternative abstraction, but we are not used to thinking of pattern matching as a form of such computation, which is why we exploit Alternative structure only in the expression language. The noble, if quixotic, exception is match-failure in do-notation, which calls the relevant fail rather than necessarily crashing out. Pattern matching is an attempt to compute an environment suitable for the evaluation of a 'right-hand side' expression; failure to compute such an environment is already handled, so why not choice?

(Edit: I should, of course, add that you only really need search if you have more than one stretchy thing in a pattern, so the proposed xs++[x] pattern shouldn't trigger any choices. Of course, it takes time to find the end of a list.)

Imagine there was some sort of funny bracket for writing Alternative computations, e.g., with (|) meaning empty, (|a1|a2|) meaning (|a1|) <|> (|a2|), and a regular old (|f s1 .. sn|) meaning pure f <*> s1 .. <*> sn. One might very well also imagine (|case a of {p1 -> a1; .. pn->an}|) performing a sensible translation of search-patterns (e.g. involving ++) in terms of Alternative combinators. We could write

lookup :: (Eq k, Alternative a) => k -> [(k, v)] -> a k
lookup k xs = (|case xs of _ ++ [(k, v)] ++ _ -> pure v|)

We may obtain a reasonable language of search-patterns for any datatype generated by fixpoints of differentiable functors: symbolic differentiation is exactly what turns tuples of structures into choices of possible substructures. Good old ++ is just the sublists-of-lists example (which is confusing, because a list-with-a-hole-for-a-sublist looks a lot like a list, but the same is not true for other datatypes).

Hilariously, with a spot of LinearTypes, we might even keep hold of holey data by their holes as well as their root, then plug away destructively in constant time. It's scandalous behaviour only if you don't notice you're doing it.

like image 193
pigworker Avatar answered Nov 15 '22 13:11

pigworker


You can only pattern match on constructors, not on general functions.

Mathematically, a constructor is an injective function: each combination of arguments gives one unique value, in this case a list. Because that value is unique, the language can deconstruct it again into the original arguments. I.e., when you pattern match on :, you essentially use the function

uncons :: [a] -> Maybe (a, [a])

which checks if the list is of a form you could have constructed with : (i.e., if it is non-empty), and if yes, gives you back the head and tail.

++ is not injective though, for example

Prelude> [0,1] ++ [2]
[0,1,2]
Prelude> [0] ++ [1,2]
[0,1,2]

Neither of these representations is the right one, so how should the list be deconstructed again?

What you can do however is define a new, “virtual” constructor that acts like : in that it always seperates exactly one element from the rest of the list (if possible), but does so on the right:

{-# LANGUAGE PatternSynonyms, ViewPatterns #-}

pattern (:>) :: [a] -> a -> [a]
pattern (xs:>ω) <- (unsnoc -> Just (xs,ω))
 where xs:>ω = xs ++ [ω]

unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc [x] = Just x
unsnoc (_:xs) = unsnoc xs

Then

sum' :: Num a => [a] -> a
sum' (xs:>x) = x + sum xs
sum' [] = 0

Note that this is very inefficient though, because the :> pattern-synonym actually needs to dig through the entire list, so sum' has quadratic rather than linear complexity.

A container that allows pattern matching on both the left and right end efficiently is Data.Sequence, with its :<| and :|> pattern synonyms.

like image 40
leftaroundabout Avatar answered Nov 15 '22 13:11

leftaroundabout


You can only pattern-match on data constructors, and ++ is a function, not a data constructor.

Data constructors are persistent; a value like 'c':[] cannot be simplified further, because it is a fundamental value of type [Char]. An expression like "c" ++ "d", however, can replaced with its equivalent "cd" at any time, and thus couldn't reliably be counted on to be present for pattern matching.

(You might argue that "cd" could always replaced by "c" ++ "d", but in general there isn't a one-to-one mapping between a list and a decomposition via ++. Is "cde" equivalent to "c" ++ "de" or "cd" ++ "e" for pattern matching purposes?)

like image 9
chepner Avatar answered Nov 15 '22 13:11

chepner