Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala combinations function not terminating

Tags:

scala

scalaz

I need to generate the combinations for a list of 30,000 items using scalas combinations method on a stream / list

1 to 30000.toStream.combinations(2).size 

This function never completes. When I try the same operation in python

r = list(range(1,30000))
z = itertools.combinations(r, 2)
%time sum(1 for _ in z)

The operation completes in 26.2 seconds.

Whats going on here? How can I generate the combinations of a very large list in scala?

like image 899
user2726995 Avatar asked Nov 15 '16 15:11

user2726995


People also ask

Which functions do not return anything in Scala?

The functions which do not return anything in Scala, they are called procedures. object Hello { def printMe ( ) : Unit = { println ("Hello, Scala!") } } Scala provides a number of syntactic variations for invoking methods. Following is the standard way to call a method −

How to call a method in Scala?

Scala provides a number of syntactic variations for invoking methods. Following is the standard way to call a method − If a function is being called using an instance of the object, then we would use dot notation similar to Java as follows − Try the following example program to define and then call the same function.

Why is Scala called a functional language?

Save the above program in Demo.scala. The following commands are used to compile and execute this program. Scala functions are the heart of Scala programming and that's why Scala is assumed as a functional programming language. Following are few important concepts related to Scala functions which should be understood by a Scala programmer.

What is function composition in Scala?

Scala | Function Composition. Function composition is a way in which a function is mixed with other functions. During the composition the one function holds the reference to another function in order to fulfill it’s mission.


2 Answers

@TomasMikula provided an alternative, I was interested to see why combinations was inefficient in generating the result.

A quick look using Mission Control and Flight Recorder revealed the problem:

Mission Control

The CombinationItr iterator invokes IndexedSeqOptimized.slice each iteration of next(). ArrayBuilder creates a new builder each time it runs with number of elements it needs to iterate, which means it will allocate 30,000 Array[Int], each of them containing n - 1 elements, causing a total of 11.10GB in a 1 minute sample. This causes massive amounts of GC pressure and is generally not very effecient.

like image 79
Yuval Itzchakov Avatar answered Sep 23 '22 16:09

Yuval Itzchakov


I don't know why the implementation in stdlib takes so long. However, this straightforward implementation (specialized to pairs and Lists), is comparable to the Python one:

def combinations2[A](l: List[A]): Iterator[(A, A)] =
  l.tails.flatMap(_ match {
    case h :: t => t.iterator.map((h, _))
    case Nil => Iterator.empty
  })

Then

scala> {
     |   val t0 = System.nanoTime
     |   val res = combinations2((1 to 30000).toList).size
     |   val secs = (System.nanoTime - t0) / 1000000.0
     |   s"$res (computed in $secs seconds)"
     | }
res11: String = 449985000 (computed in 24992.487638 seconds)
like image 43
Tomas Mikula Avatar answered Sep 22 '22 16:09

Tomas Mikula