Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala immutable list internal implementation

Suppose I am having a huge list having elements from 1 to 1 million.

val initialList = List(1,2,3,.....1 million)
and 
val myList = List(1,2,3)

Now when I apply an operation such as foldLeft on the myList giving initialList as the starting value such as

val output = myList.foldLeft(initialList)(_ :+ _)

// result ==>> List(1,2,3,.....1 million, 1 , 2 , 3)

Now my question comes here, both the lists being immutable the intermediate lists that were produced were

List(1,2,3,.....1 million, 1)
List(1,2,3,.....1 million, 1 , 2)
List(1,2,3,.....1 million, 1 , 2 , 3)

By the concept of immutability, every time a new list is being created and the old one being discarded. So isn't this operation a performance killer in scala as every time a new list of 1 million elements has to be copied to create a new list.

Please correct me if I am wrong as I am trying to understand the internal implementation of an immutable list. Thanks in advance.

like image 487
Chaitanya Waikar Avatar asked Mar 26 '26 02:03

Chaitanya Waikar


2 Answers

Yup, this is performance killer, but this is a cost of having immutable structures (which are amazing, safe and makes programs much less buggy). That's why you should often avoid appending list if you can. There is many tricks that can avoid this issue (try to use accumulators).

For example:

Instead of:

val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = myList.foldLeft(initialList)(_ :+ _)

You can write:

val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = List(initialList,myList).flatten

Flatten is implemented to copy first line only once instead of copying it for every single append.

P.S.

At least adding element to the front of list works fast (O(1)), cause sharing of old list is possible. Let's Look at this example: Linked list example You can see how memory sharing works for immutable linked lists. Computer only keeps one copy of (b,c,d) end. But if you want to append bar to the end of baz you cannot modify baz, cause you would destroy foo, bar and raz! That's why you have to copy first list.

like image 172
Piotr Banaś Avatar answered Mar 27 '26 16:03

Piotr Banaś


Appending to a List is not a good idea because List has linear cost for appending. So, if you can

  • either prepend to the List (List have constant time prepend)
  • or choose another collection that is efficient for appending. That would be a Queue

For the list of performance characteristic per operation on most scala collections, See:

https://docs.scala-lang.org/overviews/collections/performance-characteristics.html


Note that, depending on your requirement, you may also make your own smarter collection, such as chain iterable for example

like image 29
Juh_ Avatar answered Mar 27 '26 16:03

Juh_



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!