Why do Scala parallel collections sometimes cause an OutOfMemoryError?

Question

This takes around 1 second

(1 to 1000000).map(_+3)

While this gives java.lang.OutOfMemoryError: Java heap space

(1 to 1000000).par.map(_+3)

EDIT:

I have standard scala 2.9.2 configuration. I am typing this on scala prompt. And in the bash i can see [ -n "$JAVA_OPTS" ] || JAVA_OPTS="-Xmx256M -Xms32M"

AND i dont have JAVA_OPTS set in my env.

1 million integers = 8MB, creating list twice = 16MB

Nicolas · Accepted Answer

It seems definitely related to the JVM memory option and to the memory required to stock a Parralel collection. For example:

scala> (1 to 1000000).par.map(_+3)

ends up with a OutOfMemoryError the third time I tried to evaluate it, while

scala> (1 to 1000000).par.map(_+3).seq

never failed. The issue is not the computation its the storage of the Parrallel collection.

axel22 · Answer

Several reasons for the failure:

Parallel collections are not specialized, so the objects get boxed. This means that you can't multiply the number of elements with 8 to get the memory usage.
Using map means that the range is converted into a vector. For parallel vectors an efficient concatenation has not been implemented yet, so merging intermediate vectors produced by different processors proceeds by copying - requiring more memory. This will be addressed in future releases.
The REPL stores previous results - the object evaluated in each line remains in memory.

Why do Scala parallel collections sometimes cause an OutOfMemoryError?

Tags:

collections

parallel-processing

scala

FUD

2 Answers

Nicolas

axel22

Recent Activity

Donate For Us

Why do Scala parallel collections sometimes cause an OutOfMemoryError?

Tags:

collections

parallel-processing

scala

FUD

2 Answers

Nicolas

axel22

Related questions

Recent Activity

Donate For Us