Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell "collections" language design

Tags:

Why is the Haskell implementation so focused on linked lists?

For example, I know Data.Sequence is more efficient with most of the list operations (except for the cons operation), and is used a lot; syntactically, though, it is "hardly supported". Haskell has put a lot of effort into functional abstractions, such as the Functor and the Foldable class, but their syntax is not compatible with that of the default list.

If, in a project I want to optimize and replace my lists with sequences - or if I suddenly want support for infinite collections, and replace my sequences with lists - the resulting code changes are abhorrent.

So I guess my wondering can be made concrete in questions such as:

  1. Why isn't the type of map equal to (Functor f) => (a -> b) -> f a -> f b?
  2. Why can't the [] and (:) functions be used for, for example, the type in Data.Sequence?

I am really hoping there is some explanation for this, that doesn't include the words "backwards compatibility" or "it just grew that way", though if you think there isn't, please let me know. Any relevant language extensions are welcome as well.

like image 818
wen Avatar asked Nov 15 '10 01:11

wen


3 Answers

Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class. Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.

  • fmap instead of map
  • mempty instead of []
  • mappend instead of (++)

If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.

-- For now, use lists
type List a = [a]
nil = []
cons x xs = x : xs

{- Switch to Seq in the future
-- type List a = Seq a
-- nil = empty
-- cons x xs = x <| xs
-}

Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.


Why there's so much list-specific stuff in Haskell

Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.

The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.

There are several classes that are useful generalizations of lists:

  • Functor supplies fmap, a generalization of map.
  • Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
  • Applicative and Monad supply methods that are useful for interpreting collections as computations.
  • Traversable and Foldable supply useful methods for running computations over collections.

Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.

like image 170
Heatsink Avatar answered Sep 23 '22 11:09

Heatsink


I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.

EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:

3You might ask why we need a separate map function. Why not just do away with the current list-only map function, and rename fmap to map instead? Well, that’s a good question. The usual argument is that someone just learning Haskell, when using map incorrectly, would much rather see an error about lists than about Functors.

For (:), the (<|) function seems to be a replacement. I have no idea about [].

like image 35
li.davidm Avatar answered Sep 22 '22 11:09

li.davidm


A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.

The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.

List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.

The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.

like image 5
stephen tetley Avatar answered Sep 24 '22 11:09

stephen tetley