As I understood, a List in Haskell is a similar to a Linked-List in C language.
So for expressions below:
a = [1,2,3]
b = [4,5,6]
a ++ b
Haskell implement this in a recursive way like this:
(++) (x:xs) ys = x:xs ++ ys
The time complexity for that is O(n)
..
However, I was wondering why can't I implement ++
more efficiently.
The most efficient way may be like this:
make a copy(fork) of a
, let's call it a'
, there may be some tricks to do this in O(1)
time
make the last element of a'
to point to the first element of b
. This can be done easily in O(1)
time..
Does anyone have ideas about this? Thanks!
That's pretty much what the recursive solution does. It's the copying of a
which takes O(n) (where n
is the length of a
. The length of b
doesn't affect the complexity).
There is really no "trick" to copy a list of n
elements in O(1) time.
See the copy(fork) part is the problem - the recursive solution does exactly this (and you really have to do it, because you have to adjust all the pointers for the elements in the a
list.
Let's say a = [a1,a2,a3]
and b
is some list.
You have to make a new copy of a3
(let's call it a3'
) because it should now no longer point to an empty list but to the start of b
.
Then you have to make a copy of the second to last element a2
too because it must point to a3'
and finally - for the same reason - you have to create a new copy of a1
too (pointing to a2'
).
This is exactly what the recursive definition does - it's no problem with the algorithm - it's a problem with the data-structure (it's just not good with concatenation).
If you don't allow mutability and want the structure of a list you can really do nothing else.
You have this in other langs. too if they provide immutable data - for example in .net strings are immutable - so there is almost the same problem with string-concatenation as is here (if you concat lots of strings your program will perform poorly). There are workaround (StringBuilder
) that will deal better with the memory footprint - but of course those are no longer immutable data-structures.
There is no way to do that concatenation in constant time, simply because the immutability of the data structure doesn't allow it.
You might think that you could do something similar to the "cons" operator (:
) that adds an additional element x0
to the front of a list oldList=[x1,x2,x3]
(resulting in newList=(x0:oldLIst)
) without having to run through the whole list. But that's just because you don't touch the existing list oldList
, but simply reference it.
x0 : ( x1 : ( x2 : ( x3 : [] ) ) )
^ ^
newList oldList
But in your case (a ++ b
) we are talking about updating a reference deep within the data structure. You want to replace the []
in 1:(2:(3:[]))
(the explicit form of [1,2,3]
) by the new tail b
. Just count the parenthesis and you'll see that we have to go deep inside to get to the []
. That's always expensive because we have to duplicate the whole outer part, in order to make sure that a
stays unmodified. In the resulting list, where would the old a
point to in order to have the unmodified list?
1 : ( 2 : ( 3 : b ) )
^ ^
a++b b
That's impossible in the same data structure. So we need a second one:
1 : ( 2 : ( 3 : [] ) )
^
a
And that means duplicating those :
nodes, which necessarily costs the mentioned linear time in the first list. The "copy(fork)" that you mentioned is therefore, differently from what you said, not in O(1).
make a copy(fork) of a, let's call it a', there may be some tricks to do this in O(1) time
When you talk about a "trick" to fork something in constant time, you probably think about not actually making a full copy, but creating a reference to the original a
, with the changes stored as "annotations" (like the hint: "modification to tail: use b
instead of []
").
But that's what Haskell, thanks to its lazyness, does anyway! It doesn't immediately execute the O(n) algorithm, but just "remembers" that you wanted a concatenated list, until you actually access its elements. But that doesn't save you from paying the cost in the end. Because even though in the beginning the reference was cheap (in O(1), just like you wanted), when you do access the actual list elements, every instance of the ++
operator adds a little overhead (the cost of "interpreting the annotation" that you added to your reference) to the access of every element in the first part of the concatenation, effectively adding the O(n) cost finally.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With