Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scope & memory issues in Scala

I have a very large List of numbers, which undergo lots of math manipulation. I only care about the final result. To simulate this behavior, see my example code below:

object X { 
def main(args:Array[String]) = {
    val N = 10000000
    val x = List(1 to N).flatten
    println(x.slice(0,10))
    Thread.sleep( 5000)
    val y = x.map(_*5)
    println(y.slice(0,10))
    Thread.sleep( 5000)
    val z = y.map( _+4)
    println(z.slice(0,10))
    Thread.sleep( 5000)
}
     }

So x is a very large list. I care only about the result z. To obtain z, I first have to mathematically manipulate x to get y. Then I manipulate y to get z. ( I cannot go from x to z in one step, because the manipulations are quite complicated. This is just an example. )

So when I run this example, I run out of memory presumably because x, y and z are all in scope and they all occupy memory.

So I try the following:

def main(args:Array[String]) = {
    val N = 10000000
    val z = {
            val y = {
                val x = List(1 to N).flatten
                println(x.slice(0,10))
                Thread.sleep( 5000)
                x

            }.map(_*5)

            println(y.slice(0,10))
            Thread.sleep( 5000)
            y

    }.map( _+4)
    println(z.slice(0,10))
    Thread.sleep(5000)
}

So now only z is in scope. So presumably x and y are created and then garbage collected when they go out of scope. But this isn't what happens. Instead, I again run out of memory!

( Note: I am using java -Xincgc, but it doesn't help )

Question: When I have adequate memory for only 1 large list, can I somehow manipulate it using only val's ( ie. no mutable vars or ListBuffers ), maybe using scoping to force gc ? If so, how ? Thanks

like image 978
k r Avatar asked Nov 25 '11 19:11

k r


2 Answers

Have you tried something like this?

val N = 10000000
val x = List(1 to N).flatten.view // get a view
val y = x.map(_ * 5)
val z = y.map(_ + 4)
println(z.force.slice(0, 10))

It should help avoiding creating the intermediate full structure for y and z.

like image 177
huynhjl Avatar answered Sep 26 '22 00:09

huynhjl


Look at using view. It takes a collection and lazily loads it, only calculates the value when required. It doesn't form an intermediate collection:

scala> (1 to 5000000).map(i => {i*i}).map(i=> {i*2}) .toList
java.lang.OutOfMemoryError: Java heap space
        at java.lang.Integer.valueOf(Integer.java:625)
        at scala.runtime.BoxesRunTime.boxToInteger(Unknown Source)
        at scala.collection.immutable.Range.foreach(Range.scala:75)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
        at scala.collection.immutable.Range.map(Range.scala:43)
        at .<init>(<console>:8)
        at .<clinit>(<console>)
        at .<init>(<console>:11)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
        at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
        at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
        at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
        at java.lang.Thread.run(Thread.java:662)
scala> (1 to 5000000).view.map(i => {i*i}).view.map(i=> {i*2}) .toList
res10: List[Int] = List(2, 8, 18, 32, 50, 72, ...
like image 24
Matthew Farwell Avatar answered Sep 23 '22 00:09

Matthew Farwell