Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

Question

I am using Spark Standalone single machine, with 128G memory and 32 cores. The following are settings I think relevant to my problem:

spark.storage.memoryFraction     0.35
spark.default.parallelism        50
spark.sql.shuffle.partitions     50

I have a Spark application in which there is a loop for 1000 devices. With each loop (device) it prepares feature vector and then calls k-Means of MLLib. At 25th to 30th iteration of loop (processing 25th to 30th device), it runs into the error of "Java.lang.OutOfMemoryError: Java heap space".

I tried memoryFraction from 0.7 to 0.35, but it didn't help. I also tried parallelism/partitions to 200 with no luck. The JVM option is "-Xms25G -Xmx25G -XX:MaxPermSize=512m". My data size is only about 2G.

Here is stack trace:

java.lang.OutOfMemoryError: Java heap space
  at java.util.Arrays.copyOf(Arrays.java:2271)
  at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
  at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
  at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
  at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1841)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
  at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:138)
  at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:136)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashTable$class.serializeTo(HashTable.scala:125)
  at scala.collection.mutable.HashMap.serializeTo(HashMap.scala:40)
  at scala.collection.mutable.HashMap.writeObject(HashMap.scala:136)
  at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)

At the beginning, the application looks fine, but after it runs for a while and processes more and more devices, Java heap is occupied gradually and memory is not released by JVM. How to diagnose and fix such problem?

R Sawant · Accepted Answer

You can always use profiler tools like visualVM. to monitor memory growth. Hopefully you are using 64 bit JVM and not 32 bit JVM. 32 bit process can use only use 2GB memory, so the memory setting essentially will be of no use. Hope this helps

Sumit · Answer

Apart from Driver and Executor memory, would suggest to try following options: -

Switch to Kryo Serialization - http://spark.apache.org/docs/latest/tuning.html#data-serialization
Use MEMORY_AND_DISK_SER_2 for RDD persistence.

Also, would be good if you can post the code.

mehmetminanc · Answer

JVM options are not sufficient for configuring Spark memory, you also need to set spark.driver.memory (for driver, obv.) and spark.executor.memory (for workers). Those are set to 1gb per default. See this thorough guide for more information. Actually, I urge you to read it, there is a hell lot of stuff there and getting acquainted with it will definitely pay off later on.

Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

Tags:

java

heap-memory

jvm

out-of-memory

apache-spark

wdz

3 Answers

R Sawant

Sumit

mehmetminanc

Recent Activity

Donate For Us

Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

Tags:

java

heap-memory

jvm

out-of-memory

apache-spark

wdz

3 Answers

R Sawant

Sumit

mehmetminanc

Related questions

Recent Activity

Donate For Us