I need to generate the combinations for a list of 30,000 items using scalas combinations method on a stream / list
1 to 30000.toStream.combinations(2).size
This function never completes. When I try the same operation in python
r = list(range(1,30000))
z = itertools.combinations(r, 2)
%time sum(1 for _ in z)
The operation completes in 26.2 seconds.
Whats going on here? How can I generate the combinations of a very large list in scala?
The functions which do not return anything in Scala, they are called procedures. object Hello { def printMe ( ) : Unit = { println ("Hello, Scala!") } } Scala provides a number of syntactic variations for invoking methods. Following is the standard way to call a method −
Scala provides a number of syntactic variations for invoking methods. Following is the standard way to call a method − If a function is being called using an instance of the object, then we would use dot notation similar to Java as follows − Try the following example program to define and then call the same function.
Save the above program in Demo.scala. The following commands are used to compile and execute this program. Scala functions are the heart of Scala programming and that's why Scala is assumed as a functional programming language. Following are few important concepts related to Scala functions which should be understood by a Scala programmer.
Scala | Function Composition. Function composition is a way in which a function is mixed with other functions. During the composition the one function holds the reference to another function in order to fulfill it’s mission.
@TomasMikula provided an alternative, I was interested to see why combinations
was inefficient in generating the result.
A quick look using Mission Control and Flight Recorder revealed the problem:
The CombinationItr
iterator invokes IndexedSeqOptimized.slice
each iteration of next()
. ArrayBuilder
creates a new builder each time it runs with number of elements it needs to iterate, which means it will allocate 30,000 Array[Int]
, each of them containing n - 1 elements, causing a total of 11.10GB in a 1 minute sample. This causes massive amounts of GC pressure and is generally not very effecient.
I don't know why the implementation in stdlib takes so long. However, this straightforward implementation (specialized to pairs and List
s), is comparable to the Python one:
def combinations2[A](l: List[A]): Iterator[(A, A)] =
l.tails.flatMap(_ match {
case h :: t => t.iterator.map((h, _))
case Nil => Iterator.empty
})
Then
scala> {
| val t0 = System.nanoTime
| val res = combinations2((1 to 30000).toList).size
| val secs = (System.nanoTime - t0) / 1000000.0
| s"$res (computed in $secs seconds)"
| }
res11: String = 449985000 (computed in 24992.487638 seconds)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With