Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: fastest `remove(i: Int)` in mutable sequence

Which implementation from scala.collection.mutable package should I take if I intend to do lots of by-index-deletions, like remove(i: Int), in a single-threaded environment? The most obvious choice, ListBuffer, says that it may take linear time depending on buffer size. Is there some collection with log(n) or even constant time for this operation?

like image 856
BorisOkunskiy Avatar asked Dec 14 '10 17:12

BorisOkunskiy


3 Answers

Removal operators, including buf remove i, are not part of Seq, but it's actually part of Buffer trait under scala.mutable. (See Buffers)

See the first table on Performance Characteristics. I am guessing buf remove i has the same characteristic as insert, which are linear for both ArrayBuffer and ListBuffer. As documented in Array Buffers, they use arrays internally, and Link Buffers use linked lists (that's still O(n) for remove).

As an alternative, immutable Vector may give you an effective constant time.

Vectors are represented as trees with a high branching factor. Every tree node contains up to 32 elements of the vector or contains up to 32 other tree nodes. [...] So for all vectors of reasonable size, an element selection involves up to 5 primitive array selections. This is what we meant when we wrote that element access is "effectively constant time".

scala> import scala.collection.immutable._
import scala.collection.immutable._

scala> def remove[A](xs: Vector[A], i: Int) = (xs take i) ++ (xs drop (i + 1))
remove: [A](xs: scala.collection.immutable.Vector[A],i: Int)scala.collection.immutable.Vector[A]

scala> val foo = Vector(1, 2, 3, 4, 5)
foo: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)

scala> remove(foo, 2)
res0: scala.collection.immutable.Vector[Int] = Vector(1, 2, 4, 5)

Note, however, a high constant time with lots of overhead may not win a quick linear access until the data size is significantly large.

like image 58
Eugene Yokota Avatar answered Oct 03 '22 18:10

Eugene Yokota


Depending on your exact use case, you may be able to use LinkedHashMap from scala.collection.mutable.

Although you cannot remove by index, you can remove by a unique key in constant time, and it maintains a deterministic ordering when you iterate.

scala> val foo = new scala.collection.mutable.LinkedHashMap[String,String]
foo: scala.collection.mutable.LinkedHashMap[String,String] = Map()

scala> foo += "A" -> "A"
res0: foo.type = Map((A,A))

scala> foo += "B" -> "B"
res1: foo.type = Map((A,A), (B,B))

scala> foo += "C" -> "C"
res2: foo.type = Map((A,A), (B,B), (C,C))

scala> foo -= "B"
res3: foo.type = Map((A,A), (C,C))
like image 22
Mike Avatar answered Oct 03 '22 18:10

Mike


Java's ArrayList effectively has constant time complexity if the last element is the one to be removed. Look at the following snippet copied from its source code,

        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--size] = null; // clear to let GC do its work

As you can see, if numMoved is equal to 0, remove will not shift and copy the array at all. This in some scenarios can be quite useful. For example, if you do not care about the ordering that much, to remove an element, you can always swap it with the last element, and then delete the last element from the ArrayList, which effectively makes the remove operation all the way constant time. I was hoping ArrayBuffer would do the same, unfortunately that is not the case.

like image 32
Sheng Avatar answered Oct 03 '22 19:10

Sheng