The fact that Haskell's default String
implementation is not efficient both in terms of speed and memory is well known. As far as I know the [] lists
in general are implemented in Haskell as singly-linked lists and for most small/simple data types (e.g. Int
) it doesn't seem like a very good idea, but for String
it seems like total overkill. Some of the opinions on this matter include:
Real World Haskell
On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude.
Efficient String Implementation in Haskell
Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it's N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (...). Strings are much less likely to be able to be optimized to loops, etc. by the compiler.
I know that Haskell has ByteString
s (and Array
s) in several nice flavors and that they can do the job nicely, but I would expect the default implementation to be the most efficient one.
TL;DR: Why is Haskell's default String
implementation a singly-linked list even though it is terribly inefficient and rarely used for real world applications (except for the really simple ones)? Are there historical reasons? Is it easier to implement?
Is Haskell list a linked list? Second, lists in Haskell are (internally) implemented as linked lists. This is different from many other languages, where the word "list" and "array" is used interchangably.
Is a string a list of char Haskell? A String is a list of characters. String constants in Haskell are values of type String .
Why is Haskell's default String implementation a singly-linked list
Because singly-linked lists support:
and so String
as [Char]
(unicode points) means a string type that fits the language goals (as of 1990), and essentially come "for free" with the list library.
In summary, historically the language designers were interested more in well-designed core data types, than the modern problems of text processing, so we have an elegant, easy to understand, easy to teach String
type, that isn't quite a unicode text chunk, and isn't a dense, packed, strict data type.
Efficiency is only one axis to measure an abstraction on. While lists are pretty inefficient for text-y operations, they are darn convenient in that there's a lot of list operations implemented polymorphically that have useful interpretations when specialized to [Char]
, so you get a lot of reuse both in the library implementation and in the user's brain.
It's not clear that, were the language being designed today from scratch with our current level of experience, the same decision would be made; however, it's not always possible to make decisions perfectly before experience is available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With