Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

I am using Spark Standalone single machine, with 128G memory and 32 cores. The following are settings I think relevant to my problem:

spark.storage.memoryFraction     0.35
spark.default.parallelism        50
spark.sql.shuffle.partitions     50

I have a Spark application in which there is a loop for 1000 devices. With each loop (device) it prepares feature vector and then calls k-Means of MLLib. At 25th to 30th iteration of loop (processing 25th to 30th device), it runs into the error of "Java.lang.OutOfMemoryError: Java heap space".

I tried memoryFraction from 0.7 to 0.35, but it didn't help. I also tried parallelism/partitions to 200 with no luck. The JVM option is "-Xms25G -Xmx25G -XX:MaxPermSize=512m". My data size is only about 2G.

Here is stack trace:

java.lang.OutOfMemoryError: Java heap space
  at java.util.Arrays.copyOf(Arrays.java:2271)
  at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
  at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
  at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
  at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1841)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
  at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:138)
  at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:136)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashTable$class.serializeTo(HashTable.scala:125)
  at scala.collection.mutable.HashMap.serializeTo(HashMap.scala:40)
  at scala.collection.mutable.HashMap.writeObject(HashMap.scala:136)
  at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)

At the beginning, the application looks fine, but after it runs for a while and processes more and more devices, Java heap is occupied gradually and memory is not released by JVM. How to diagnose and fix such problem?

like image 379
wdz Avatar asked Nov 27 '15 19:11

wdz


3 Answers

You can always use profiler tools like visualVM. to monitor memory growth. Hopefully you are using 64 bit JVM and not 32 bit JVM. 32 bit process can use only use 2GB memory, so the memory setting essentially will be of no use. Hope this helps

like image 53
R Sawant Avatar answered Sep 21 '22 23:09

R Sawant


Apart from Driver and Executor memory, would suggest to try following options: -

  1. Switch to Kryo Serialization - http://spark.apache.org/docs/latest/tuning.html#data-serialization
  2. Use MEMORY_AND_DISK_SER_2 for RDD persistence.

Also, would be good if you can post the code.

like image 22
Sumit Avatar answered Sep 19 '22 23:09

Sumit


JVM options are not sufficient for configuring Spark memory, you also need to set spark.driver.memory (for driver, obv.) and spark.executor.memory (for workers). Those are set to 1gb per default. See this thorough guide for more information. Actually, I urge you to read it, there is a hell lot of stuff there and getting acquainted with it will definitely pay off later on.

like image 20
mehmetminanc Avatar answered Sep 20 '22 23:09

mehmetminanc