Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is `++` for Haskell List implemented recursively and costs O(n) time?

As I understood, a List in Haskell is a similar to a Linked-List in C language.

So for expressions below:

a = [1,2,3]
b = [4,5,6]
a ++ b

Haskell implement this in a recursive way like this:

(++) (x:xs) ys = x:xs ++ ys

The time complexity for that is O(n)..

However, I was wondering why can't I implement ++ more efficiently.

The most efficient way may be like this:

  1. make a copy(fork) of a, let's call it a', there may be some tricks to do this in O(1) time

  2. make the last element of a' to point to the first element of b. This can be done easily in O(1) time..

Does anyone have ideas about this? Thanks!

like image 387
Hanfei Sun Avatar asked Jun 07 '15 10:06

Hanfei Sun


3 Answers

That's pretty much what the recursive solution does. It's the copying of a which takes O(n) (where n is the length of a. The length of b doesn't affect the complexity).

There is really no "trick" to copy a list of n elements in O(1) time.

like image 77
shang Avatar answered Nov 16 '22 10:11

shang


See the copy(fork) part is the problem - the recursive solution does exactly this (and you really have to do it, because you have to adjust all the pointers for the elements in the a list.

Let's say a = [a1,a2,a3] and b is some list.

You have to make a new copy of a3 (let's call it a3') because it should now no longer point to an empty list but to the start of b.

Then you have to make a copy of the second to last element a2 too because it must point to a3' and finally - for the same reason - you have to create a new copy of a1 too (pointing to a2').

This is exactly what the recursive definition does - it's no problem with the algorithm - it's a problem with the data-structure (it's just not good with concatenation).

If you don't allow mutability and want the structure of a list you can really do nothing else.

You have this in other langs. too if they provide immutable data - for example in .net strings are immutable - so there is almost the same problem with string-concatenation as is here (if you concat lots of strings your program will perform poorly). There are workaround (StringBuilder) that will deal better with the memory footprint - but of course those are no longer immutable data-structures.

like image 41
Random Dev Avatar answered Nov 16 '22 10:11

Random Dev


There is no way to do that concatenation in constant time, simply because the immutability of the data structure doesn't allow it.


You might think that you could do something similar to the "cons" operator (:) that adds an additional element x0 to the front of a list oldList=[x1,x2,x3] (resulting in newList=(x0:oldLIst)) without having to run through the whole list. But that's just because you don't touch the existing list oldList, but simply reference it.

x0  :  ( x1  :  ( x2  :  ( x3  :  [] )   )   )
^        ^
newList  oldList

But in your case (a ++ b) we are talking about updating a reference deep within the data structure. You want to replace the [] in 1:(2:(3:[])) (the explicit form of [1,2,3]) by the new tail b. Just count the parenthesis and you'll see that we have to go deep inside to get to the []. That's always expensive because we have to duplicate the whole outer part, in order to make sure that a stays unmodified. In the resulting list, where would the old a point to in order to have the unmodified list?

1  :  ( 2  :  ( 3  :  b  )   )
^                     ^
a++b                  b

That's impossible in the same data structure. So we need a second one:

1  :  ( 2  :  ( 3  :  []  )   )
^
a

And that means duplicating those : nodes, which necessarily costs the mentioned linear time in the first list. The "copy(fork)" that you mentioned is therefore, differently from what you said, not in O(1).


make a copy(fork) of a, let's call it a', there may be some tricks to do this in O(1) time

When you talk about a "trick" to fork something in constant time, you probably think about not actually making a full copy, but creating a reference to the original a, with the changes stored as "annotations" (like the hint: "modification to tail: use b instead of []").

But that's what Haskell, thanks to its lazyness, does anyway! It doesn't immediately execute the O(n) algorithm, but just "remembers" that you wanted a concatenated list, until you actually access its elements. But that doesn't save you from paying the cost in the end. Because even though in the beginning the reference was cheap (in O(1), just like you wanted), when you do access the actual list elements, every instance of the ++ operator adds a little overhead (the cost of "interpreting the annotation" that you added to your reference) to the access of every element in the first part of the concatenation, effectively adding the O(n) cost finally.

like image 21
mastov Avatar answered Nov 16 '22 10:11

mastov