Spark 2.0 memory fraction

Tags:

I am working with Spark 2.0, the job starts by sorting the input data and storing its output on HDFS.

I was getting out of memory errors, the solution was to increase the value of "spark.shuffle.memoryFraction" from 0.2 to 0.8 and this solved the problem. But in the documentation I have found that this is a deprecated parameter.

As I understand, it was replaced by "spark.memory.fraction". How to modify this parameter while taking into account the sort and storage on HDFS?

925

asked Sep 23 '16 12:09

syl

1 Answers

From the documentation:

Although there are two relevant configurations, the typical user should not need to adjust them as the default values are applicable to most workloads:

spark.memory.fraction expresses the size of M as a fraction of the (JVM heap space - 300MB) (default 0.6). The rest of the space (40%)
is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually
large records.

spark.memory.storageFraction expresses the size of R as a fraction of M (default 0.5). R is the storage space within M where cached blocks immune to being evicted by execution.

The value of spark.memory.fraction should be set in order to fit this amount of heap space comfortably within the JVM’s old or “tenured” generation. Otherwise, when much of this space is used for caching and execution, the tenured generation will be full, which causes the JVM to significantly increase time spent in garbage collection.

In spark-1.6.2 I would modify the spark.storage.memoryFraction.

As a side note, are you sure that you understand how your job behaves?

It's typical to fine tune your job starting from the memoryOverhead, #cores , etc. firstly and then move on to the attribute you modified.

190

answered Oct 05 '22 12:10

gsamaras

Related questions
                            
                                Diagnosing .NET OutOfMemoryException when generating Reports
                            
                                Notification of memory shortage in Java
                            
                                Are memory-mapped files thread safe
                            
                                How to measure firefox add-on memory usage
                            
                                Making use of swap partition in R
                            
                                gtkmm/c++ first hello world example leaking memory
                            
                                Minimizing application data memory overhead in java processes
                            
                                C - 2D Dynamic Array (Double Pointer) - Shared Memory
                            
                                MemoryError while pickling data in python
                            
                                Why is RAM in powers of 2?
                            
                                gdb seg faults when reading symbols
                            
                                Pseudo Least Recently Used Binary Tree
                            
                                memory starting location in C [duplicate]
                            
                                Nodejs - promises, unhandled termination and memory leak
                            
                                How to add string to array of strings in C
                            
                                Memory leak with UIWebView
                            
                                Why aren't the earlier terms here being garbage-collected?
                            
                                What does it mean to 'flush to disk'?
                            
                                Loading a CSV in memory with Cassava
                            
                                In C#, how much memory does a struct require?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark 2.0 memory fraction

Tags:

memory

out-of-memory

distributed-computing

apache-spark

apache-spark-2.0

syl

People also ask

1 Answers

gsamaras

Recent Activity

Donate For Us