Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Haskell's default string implementation a linked list of chars?

Tags:

The fact that Haskell's default String implementation is not efficient both in terms of speed and memory is well known. As far as I know the [] lists in general are implemented in Haskell as singly-linked lists and for most small/simple data types (e.g. Int) it doesn't seem like a very good idea, but for String it seems like total overkill. Some of the opinions on this matter include:

Real World Haskell

On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude.

Efficient String Implementation in Haskell

Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it's N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (...). Strings are much less likely to be able to be optimized to loops, etc. by the compiler.

I know that Haskell has ByteStrings (and Arrays) in several nice flavors and that they can do the job nicely, but I would expect the default implementation to be the most efficient one.

TL;DR: Why is Haskell's default String implementation a singly-linked list even though it is terribly inefficient and rarely used for real world applications (except for the really simple ones)? Are there historical reasons? Is it easier to implement?

like image 869
ljedrz Avatar asked Dec 13 '12 17:12

ljedrz


People also ask

Is Haskell list a linked list?

Is Haskell list a linked list? Second, lists in Haskell are (internally) implemented as linked lists. This is different from many other languages, where the word "list" and "array" is used interchangably.

Is a string a list in Haskell?

Is a string a list of char Haskell? A String is a list of characters. String constants in Haskell are values of type String .


2 Answers

Why is Haskell's default String implementation a singly-linked list

Because singly-linked lists support:

  • induction via pattern matching
  • have useful properties, such as Monad, Functor
  • are properly parametrically polymorphic
  • are naturally lazy

and so String as [Char] (unicode points) means a string type that fits the language goals (as of 1990), and essentially come "for free" with the list library.

In summary, historically the language designers were interested more in well-designed core data types, than the modern problems of text processing, so we have an elegant, easy to understand, easy to teach String type, that isn't quite a unicode text chunk, and isn't a dense, packed, strict data type.

like image 159
Don Stewart Avatar answered Oct 24 '22 11:10

Don Stewart


Efficiency is only one axis to measure an abstraction on. While lists are pretty inefficient for text-y operations, they are darn convenient in that there's a lot of list operations implemented polymorphically that have useful interpretations when specialized to [Char], so you get a lot of reuse both in the library implementation and in the user's brain.

It's not clear that, were the language being designed today from scratch with our current level of experience, the same decision would be made; however, it's not always possible to make decisions perfectly before experience is available.

like image 34
Daniel Wagner Avatar answered Oct 24 '22 09:10

Daniel Wagner