Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to convert Scala Array to Unique Sorted List

Can anybody optimize following statement in Scala:

// maybe large
val someArray = Array(9, 1, 6, 2, 1, 9, 4, 5, 1, 6, 5, 0, 6) 

// output a sorted list which contains unique element from the array without 0
val newList=(someArray filter (_>0)).toList.distinct.sort((e1, e2) => (e1 > e2))

Since the performance is critical, is there a better way?

Thank you.

like image 215
Tianyi Liang Avatar asked Nov 16 '11 23:11

Tianyi Liang


People also ask

How do you use distinct in Scala?

Scala List distinct() method with example. The distinct() method is utilized to delete the duplicate elements from the stated list. Return Type: It returns a new list of elements without any duplicates.

How do I sort an array in Scala?

Use the sortBy(attribute) Method to Sort an Array in Scala The sortBy method in Scala can sort based on one or more class attributes. This method can be used if we have an Ordering field type in the scope. Here, the default sorting is ascending as well.

Can list have different data types in Scala?

In a Scala list, each element need not be of the same data type.


2 Answers

This simple line is one of the fastest codes so far:

someArray.toList.filter (_ > 0).sortWith (_ > _).distinct

but the clear winner so far is - due to my measurement - Jed Wesley-Smith. Maybe if Rex' code is fixed, it looks different.

bench diagram

Typical disclaimer 1 + 2:

  1. I modified the codes to accept an Array and return an List.
  2. Typical benchmark considerations:
    • This was random data, equally distributed. For 1 Million elements, I created an Array of 1 Million ints between 0 and 1 Million. So with more or less zeros, and more or less duplicates, it might vary.
    • It might depend on the machine etc.. I used a single core CPU, Intel-Linux-32bit, jdk-1.6, scala 2.9.0.1

Here is the underlying benchcoat-code and the concrete code to produce the graph (gnuplot). Y-axis: time in seconds. X-axis: 100 000 to 1 000 000 elements in Array.

update:

After finding the problem with Rex' code, his code is as fast as Jed's code, but the last operation is a transformation of his Array to a List (to fullfill my benchmark-interface). Using a var result = List [Int], and result = someArray (i) :: result speeds his code up, so that it is about twice as fast as the Jed-Code.

Another, maybe interesting, finding is: If I rearrange my code in the order of filter/sort/distinct (fsd) => (dsf, dfs, fsd, ...), all 6 possibilities don't differ significantly.

like image 134
user unknown Avatar answered Sep 21 '22 08:09

user unknown


I haven't measured, but I'm with Duncan, sort in place then use something like:

util.Sorting.quickSort(array)
array.foldRight(List.empty[Int]){ 
  case (a, b) => 
    if (!b.isEmpty && b(0) == a) 
      b 
    else 
      a :: b 
}

In theory this should be pretty efficient.

like image 28
Jed Wesley-Smith Avatar answered Sep 23 '22 08:09

Jed Wesley-Smith