Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you sort a mutable Scala collection in place?

Is it possible to sort an ArrayBuffer, or other mutable Scala collection, in place? I see that ArrayBuffer.sorted (and sortBy) returns a new collection, and Sorting.quicksort does sort an Array in place but doesn't work on ArrayBuffers.

The reason I ask is that I'm using combineByKey in Spark to build collections of scored objects that are limited in size (like a "top ten" list by key). If I merge in a new object and the collection is already at capacity, I need to drop the lowest-scored object. I could use a sorted collection like a PriorityQueue or SortedSet, but I don't need to keep the collections sorted all the time, only in the case when a collection fills up.

So is there some way to sort an ArrayBuffer or ListBuffer in place? Or is there some other collection that supports appending and sorting in place? I'm sure there's a better way to do this, but I'm new to Scala.

like image 845
Matt Avatar asked Jun 20 '15 02:06

Matt


3 Answers

You can use Java's sorting utilities.

Here is an example:

val myArray = Array(1,12,5,6)
java.util.Arrays.sort(myArray)

At the REPL:

> myArray
res3: Array[Int] = Array(1, 5, 6, 12)

If what you have is a Scala ArrayBuffer then call toArray to convert it to an Array.

Of course, the toArray on ArrayBuffer induces the cost of coping the entire Buffer again. If this is costly, check if you can get your initial results in an Array instead of an ArrayBuffer. If the results are of fixed length and unlikely to grow, then you don't rally need the dynamic expansion features of ArrayBuffer.

like image 165
marios Avatar answered Sep 28 '22 00:09

marios


There are presently no facilities for sorting collections in place. That said, if you expect to have to do sorting extremely rarely, you could investigate supporting both separately e.g. as Either[PriorityQueue[A], ArrayBuffer[A]]; or if you expect sorting to be fairly common you should use a data structure where you don't pay such a penalty each time you add an element--meaning just use the SortedSet or PriorityQueue. Otherwise you'll get slow really fast. (n^2 log n gets big quickly, which is what you get if you do a full sort each time you add a new element.)

like image 39
Rex Kerr Avatar answered Sep 28 '22 02:09

Rex Kerr


You can use Scala's JavaConverters to delegate to Java's Arrays.sort with 1 line of code.

Assume you have instances of Foo in a mutable buffer that you want to sort in place with the comparator fooComparator.

import scala.collection.mutable
import scala.collection.JavaConverters._

…

val buffer = mutable.ArrayBuffer[Foo]()

…

buffer.asJava.sort(fooComparator) // sort "in place" (actually hides 1 copy)

However for extreme performance it seems that ArrayBuffer just can't be used and the plain fixed size Array is the way to go. The good thing is that JavaConverters.asJava does not copy the items. However Java's List.sort method internally copies the items to an Array and calls Arrays.sort. (It then assigns the sorted items back to original collection)

Perhaps the "full solution" would be to define your own version of Scala's ArrayBuffer that exposes the underlying array for sorting. Implementing your own collection types that can do all the same things as the original, plus your own tricks in Scala is usually easy due to the way how Scala's collection library is set up.

like image 33
Peter Lamberg Avatar answered Sep 28 '22 01:09

Peter Lamberg