I am using Spark Standalone single machine, with 128G memory and 32 cores. The following are settings I think relevant to my problem:
spark.storage.memoryFraction 0.35
spark.default.parallelism 50
spark.sql.shuffle.partitions 50
I have a Spark application in which there is a loop for 1000 devices. With each loop (device) it prepares feature vector and then calls k-Means of MLLib. At 25th to 30th iteration of loop (processing 25th to 30th device), it runs into the error of "Java.lang.OutOfMemoryError: Java heap space".
I tried memoryFraction from 0.7 to 0.35, but it didn't help. I also tried parallelism/partitions to 200 with no luck. The JVM option is "-Xms25G -Xmx25G -XX:MaxPermSize=512m". My data size is only about 2G.
Here is stack trace:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1841)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:138)
at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:136)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashTable$class.serializeTo(HashTable.scala:125)
at scala.collection.mutable.HashMap.serializeTo(HashMap.scala:40)
at scala.collection.mutable.HashMap.writeObject(HashMap.scala:136)
at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
At the beginning, the application looks fine, but after it runs for a while and processes more and more devices, Java heap is occupied gradually and memory is not released by JVM. How to diagnose and fix such problem?
You can always use profiler tools like visualVM. to monitor memory growth. Hopefully you are using 64 bit JVM and not 32 bit JVM. 32 bit process can use only use 2GB memory, so the memory setting essentially will be of no use. Hope this helps
Apart from Driver and Executor memory, would suggest to try following options: -
Also, would be good if you can post the code.
JVM options are not sufficient for configuring Spark memory, you also need to set spark.driver.memory
(for driver, obv.) and spark.executor.memory
(for workers). Those are set to 1gb per default. See this thorough guide for more information. Actually, I urge you to read it, there is a hell lot of stuff there and getting acquainted with it will definitely pay off later on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With