Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Could this Scala code use less memory?

Consider the following Set benchmark:

import scala.collection.immutable._

object SetTest extends App {
  def time[a](f: => a): (a,Double) = {
    val start = System.nanoTime()
    val result: a = f
    val end = System.nanoTime()
    (result, 1e-9*(end-start))
  }

  for (n <- List(1000000,10000000)) {
    println("n = %d".format(n))
    val (s2,t2) = time((Set() ++ (1 to n)).sum)
    println("sum %d, time %g".format(s2,t2))
  }
}

Compiling and running produces

tile:scalafab% scala SetTest
n = 1000000
sum 1784293664, time 0.982045
n = 10000000
Exception in thread "Poller SunPKCS11-Darwin" java.lang.OutOfMemoryError: Java heap space
...

I.e., Scala is unable to represent a set of 10 million Ints on a machine with 8 GB of memory. Is this expected behavior? Is there some way to reduce the memory footprint?

like image 519
Geoffrey Irving Avatar asked Aug 20 '11 18:08

Geoffrey Irving


2 Answers

Generic immutable sets do take a lot of memory. The default is for only 256M heap, which leaves only 26 bytes per object. The hash trie for immutable sets generally takes one to two hundred bytes per object an extra 60 or so bytes per element. If you add -J-Xmx2G on your command line to increase the heap space to 2G, you should be fine.

(This level of overhead is one reason why there are bitsets, for example.)

like image 112
Rex Kerr Avatar answered Oct 04 '22 03:10

Rex Kerr


I'm not that familiar with Scala, but here's what I think is happening:

First off, the integers are being stored on the heap (as the must be, since the data structure is stored on the heap). So we are talking about available heap memory, not stack memory at all (just to clarify the validity of what I'm about to say next).

The real kicker is that Java's default heap size is pretty small - I believe its only 128 megabytes (this is probably an really old number, but the point is that the number exists, and it's quite small).

So it's not that your program uses too much memory - it's more like Java just doesn't give you enough in the first place. There is a solution, though: the minimum and maximum heap sizes can be set with the -Xms and -Xmx command line options. They can be used like:

java -Xms32m -Xmx128m MyClass   (starts MyClass with a minimum heap of 32 megabytes, maximum of 128 megabytes)

java -Xms1g -Xmx3g MyClass (executes MyClass with a minimum heap of 1 gigabytes, maximum of 3 gigabytes)

If you use an IDE, there are probably options in there to change the heap size as well.

like image 43
Ken Wayne VanderLinde Avatar answered Oct 04 '22 04:10

Ken Wayne VanderLinde