Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does concatenation of lists take O(n)?

According to the theory of ADTs (Algebraic Data Types) the concatenation of two lists has to take O(n) where n is the length of the first list. You, basically, have to recursively iterate through the first list until you find the end.

From a different point of view, one can argue that the second list can simply be linked to the last element of the first. This would take constant time, if the end of the first list is known.

What am I missing here ?

like image 398
Radu Stoenescu Avatar asked Feb 09 '15 09:02

Radu Stoenescu


People also ask

Is used if the concatenation of two lists is to be performed on o 1 time?

You can easily concatenate two lists in O(1) time using either a single linked list or a doubly linked list, provided that you have a pointer to the last node in at least one of the lists. (And, of course, pointers to the list heads.)

Which of the given options should be used to concatenate two lists on o 1 time?

Explanation: We can easily concatenate two lists in O (1) time using singly or doubly linked list, provided that we have a pointer to the last node at least one of the lists.


1 Answers

Operationally, an Haskell list is typically represented by a pointer to the first cell of a single-linked list (roughly). In this way, tail just returns the pointer to the next cell (it does not have to copy anything), and consing x : in front of the list allocates a new cell, makes it point to the old list, and returns the new pointer. The list accessed by the old pointer is unchanged, so there's no need to copy it.

If you instead append a value with ++ [x], then you can not modify the original liked list by changing its last pointer unless you know that the original list will never be accessed. More concretely, consider

x = [1..5]
n = length (x ++ [6]) + length x

If you modify x when doing x++[6], the value of n would turn up to be 12, which is wrong. The last x refer to the unchanged list which has length 5, so the result of n must be 11.

Practically, you can't expect the compiler to optimize this, even in those cases in which x is no longer used and it could, theoretically, be updated in place (a "linear" use). What happens is that the evaluation of x++[6] must be ready for the worst-case in which x is reused afterwards, and so it must copy the whole list x.

As @Ben notes, saying "the list is copied" is imprecise. What actually happens is that the cells with the pointers are copied (the so-called "spine" on the list), but the elements are not. For instance,

x = [[1,2],[2,3]]
y = x ++ [[3,4]]

requires only to allocate [1,2],[2,3],[3,4] once. The lists of lists x,y will share pointers to the lists of integers, which do not have to be duplicated.

like image 97
chi Avatar answered Sep 22 '22 02:09

chi