I was just curious about some exact implementation details of lists in Haskell (GHC-specific answers are fine)--are they naive linked lists, or do they have any special optimizations? More specifically: <ol> <li>Do <code>length</code> and <code>(!!)</code> (for instance) have to iterate through the list?</li> <li>If so, are their values cached in any way (i.e., if I call <code>length</code> twice, will it have to iterate both times)?</li> <li>Does access to the back of the list involve iterating through the whole list?</li> <li>Are infinite lists and list comprehensions memoized? (i.e., for <code>fib = 1:1:zipWith (+) fib (tail fib)</code>, will each value be computed recursively, or will it rely on the previous computed value?)</li> </ol> Any other interesting implementation details would be much appreciated. Thanks in advance!

As far as I know (I don't know how much of this is GHC-specific) <ol> <li><code>length</code> and <code>(!!)</code> DO have to iterate through the list.</li> <li> I don't think there are any special optimisations for lists, but there is a technique that applies to all datatypes. If you have something like <pre class="prettyprint"><code>foo xs = bar (length xs) ++ baz (length xs) </code></pre> then <code>length xs</code> will be computed twice. But if instead you have <pre class="prettyprint"><code>foo xs = bar len ++ baz len where len = length xs </code></pre> then it will only be computed once. </li> <li>Yes.</li> <li>Yes, once part of a named value is computed, it is retained until the name goes out of scope. (The language doesn't require this, but this is how I understand the implementations behave.)</li> </ol>

How are lists implemented in Haskell (GHC)?

Tags:

linked-list

haskell

ghc

I was just curious about some exact implementation details of lists in Haskell (GHC-specific answers are fine)--are they naive linked lists, or do they have any special optimizations? More specifically:

Do length and (!!) (for instance) have to iterate through the list?
If so, are their values cached in any way (i.e., if I call length twice, will it have to iterate both times)?
Does access to the back of the list involve iterating through the whole list?
Are infinite lists and list comprehensions memoized? (i.e., for fib = 1:1:zipWith (+) fib (tail fib), will each value be computed recursively, or will it rely on the previous computed value?)

Any other interesting implementation details would be much appreciated. Thanks in advance!

454

asked Apr 22 '10 07:04

shosti

2 Answers

Lists have no special operational treatment in Haskell. They are defined just like:

data List a = Nil | Cons a (List a)

Just with some special notation: [a] for List a, [] for Nil and (:) for Cons. If you defined the same and redefined all the operations, you would get the exact same performance.

Thus, Haskell lists are singly-linked. Because of laziness, they are often used as iterators. sum [1..n] runs in constant space, because the unused prefixes of this list are garbage collected as the sum progresses, and the tails aren't generated until they are needed.

As for #4: all values in Haskell are memoized, with the exception that functions do not keep a memo table for their arguments. So when you define fib like you did, the results will be cached and the nth fibonacci number will be accessed in O(n) time. However, if you defined it in this apparently equivalent way:

-- Simulate infinite lists as functions from Integer type List a = Int -> a  cons :: a -> List a -> List a cons x xs n | n == 0    = x             | otherwise = xs (n-1)  tailF :: List a -> List a tailF xs n = xs (n+1)  fib :: List Integer fib = 1 `cons` (1 `cons` (\n -> fib n + tailF fib n))

(Take a moment to note the similarity to your definition)

Then the results are not shared and the nth fibonacci number will be accessed in O(fib n) (which is exponential) time. You can convince functions to be shared with a memoization library like data-memocombinators.

165

answered Oct 16 '22 12:10

luqui

As far as I know (I don't know how much of this is GHC-specific)

length and (!!) DO have to iterate through the list.
I don't think there are any special optimisations for lists, but there is a technique that applies to all datatypes.

If you have something like
```
foo xs = bar (length xs) ++ baz (length xs) 
```
then length xs will be computed twice.

But if instead you have
```
foo xs = bar len ++ baz len   where len = length xs 
```
then it will only be computed once.
Yes.
Yes, once part of a named value is computed, it is retained until the name goes out of scope. (The language doesn't require this, but this is how I understand the implementations behave.)

answered Oct 16 '22 12:10

dave4420

Related questions
                            
                                Should I use a lexer when using a parser combinator library like Parsec?
                            
                                ST Monad == code smell?
                            
                                What does this list permutations implementation in Haskell exactly do?
                            
                                Haskell Convert Integer to Int?
                            
                                Haskell Ambiguous Occurrences -- how to avoid?
                            
                                Dependent types can prove your code is correct up to a specification. But how do you prove the specification is correct?
                            
                                No instance for (Fractional Int) arising from a use of `/'
                            
                                Haskell: Check if Int is in a list of Int's
                            
                                How to extract value from monadic action
                            
                                Should do-notation be avoided in Haskell?
                            
                                Zipper for creating xml requests?
                            
                                Independent subset of cabal packages set
                            
                                Multiple Auth in Yesod?
                            
                                Profiling high-performance Haskell code
                            
                                How to gain control of a 5GB heap in Haskell?
                            
                                Haskell: Why use Proxy?
                            
                                How are Dynamic Programming algorithms implemented in idiomatic Haskell?
                            
                                Is there a Haskell idiom for updating a nested data structure?
                            
                                How to improve performance of this numerical computation in Haskell?
                            
                                How does Haskell tail recursion work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With