According to the theory of ADTs (Algebraic Data Types) the concatenation of two lists has to take <code>O(n)</code> where <code>n</code> is the length of the first list. You, basically, have to recursively iterate through the first list until you find the end. From a different point of view, one can argue that the second list can simply be linked to the last element of the first. This would take constant time, if the end of the first list is known. What am I missing here ?

Operationally, an Haskell list is typically represented by a pointer to the first cell of a single-linked list (roughly). In this way, <code>tail</code> just returns the pointer to the next cell (it does not have to copy anything), and consing <code>x :</code> in front of the list allocates a new cell, makes it point to the old list, and returns the new pointer. The list accessed by the old pointer is unchanged, so there's no need to copy it. If you instead append a value with <code>++ [x]</code>, then you can not modify the original liked list by changing its last pointer unless you know that the original list will never be accessed. More concretely, consider <pre class="prettyprint"><code>x = [1..5] n = length (x ++ [6]) + length x </code></pre> If you modify <code>x</code> when doing <code>x++[6]</code>, the value of <code>n</code> would turn up to be 12, which is wrong. The last <code>x</code> refer to the unchanged list which has length <code>5</code>, so the result of <code>n</code> must be 11. Practically, you can't expect the compiler to optimize this, even in those cases in which <code>x</code> is no longer used and it could, theoretically, be updated in place (a "linear" use). What happens is that the evaluation of <code>x++[6]</code> must be ready for the worst-case in which <code>x</code> is reused afterwards, and so it must copy the whole list <code>x</code>. As @Ben notes, saying "the list is copied" is imprecise. What actually happens is that the cells with the pointers are copied (the so-called "spine" on the list), but the elements are not. For instance, <pre class="prettyprint"><code>x = [[1,2],[2,3]] y = x ++ [[3,4]] </code></pre> requires only to allocate <code>[1,2],[2,3],[3,4]</code> once. The lists of lists <code>x,y</code> will share pointers to the lists of integers, which do not have to be duplicated.

Why does concatenation of lists take O(n)?

Tags:

complexity-theory

functional-programming

haskell

algebraic-data-types

According to the theory of ADTs (Algebraic Data Types) the concatenation of two lists has to take O(n) where n is the length of the first list. You, basically, have to recursively iterate through the first list until you find the end.

From a different point of view, one can argue that the second list can simply be linked to the last element of the first. This would take constant time, if the end of the first list is known.

What am I missing here ?

398

asked Feb 09 '15 09:02

Radu Stoenescu

1 Answers

Operationally, an Haskell list is typically represented by a pointer to the first cell of a single-linked list (roughly). In this way, tail just returns the pointer to the next cell (it does not have to copy anything), and consing x : in front of the list allocates a new cell, makes it point to the old list, and returns the new pointer. The list accessed by the old pointer is unchanged, so there's no need to copy it.

If you instead append a value with ++ [x], then you can not modify the original liked list by changing its last pointer unless you know that the original list will never be accessed. More concretely, consider

x = [1..5]
n = length (x ++ [6]) + length x

If you modify x when doing x++[6], the value of n would turn up to be 12, which is wrong. The last x refer to the unchanged list which has length 5, so the result of n must be 11.

Practically, you can't expect the compiler to optimize this, even in those cases in which x is no longer used and it could, theoretically, be updated in place (a "linear" use). What happens is that the evaluation of x++[6] must be ready for the worst-case in which x is reused afterwards, and so it must copy the whole list x.

As @Ben notes, saying "the list is copied" is imprecise. What actually happens is that the cells with the pointers are copied (the so-called "spine" on the list), but the elements are not. For instance,

x = [[1,2],[2,3]]
y = x ++ [[3,4]]

requires only to allocate [1,2],[2,3],[3,4] once. The lists of lists x,y will share pointers to the lists of integers, which do not have to be duplicated.

answered Sep 22 '22 02:09

chi

Related questions
                            
                                What is this pattern of folding and iteration?
                            
                                haskell : making a superclass of Num
                            
                                Infinite Maps in Haskell
                            
                                how to handle signal on windows with haskell?
                            
                                Is it possible to connect to SqlServer (MSSQL) with Haskell and Linux?
                            
                                Haskell Random Generation
                            
                                Example of large Monad stack
                            
                                How do I convert from unixtime to a date/time in Haskell?
                            
                                Euler #4 with bigger domain
                            
                                Typeclasses and GADTs
                            
                                Risks of using unsafeperformIO on randomIO
                            
                                How is anamorphism related to lens?
                            
                                Why isn't lift's return value constrained to be a monad?
                            
                                Haskell import module
                            
                                Summing 1 through 1,000,000 in Haskell gives a stack overflow. What's happening under the hood?
                            
                                `unsafeCoerce` implementation in Haskell
                            
                                No laziness in some vector operation
                            
                                How to make multiple eta reductions in Haskell
                            
                                What purpose for XNoImplicitPrelude?
                            
                                How do we formally say that a function is non-strict in an argument?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With