Apache Spark's performance tuning

Question

I am working on a project where in I have to tune spark's performance. I have found four most important parameters that will help in tuning spark's performance. They are as follows:

spark.memory.fraction
spark.memory.offHeap.size
spark.storage.memoryFraction
spark.shuffle.memoryFraction

I wanted to know whether I am going in the right direction or not? Please let me know if I missed out on some other parameters also.

Thanks in advance.

eliasah · Accepted Answer

This is is quite broad to answer honestly. The right path to optimize performance is mainly described in the official documentation in the section concerning Tuning Spark.

Generally speaking, there is lots of factors to optimize spark jobs :

Data Serialization
Memory Tuning
Level of Parallelism
Memory Usage of Reduce Tasks
Broadcasting Large Variables
Data Locality

It's mainly centralized around data serialization, memory tuning and a trade-off between precision/approximation techniques to get the job done fast.

EDIT:

Courtesy of @zero323 :

I'd point out, that all but one option mentioned in the question, are deprecated and used only in legacy mode.

Apache Spark's performance tuning

Tags:

apache-spark

Srinivas Shekar

1 Answers

eliasah

Recent Activity

Donate For Us

Apache Spark's performance tuning

Tags:

apache-spark

Srinivas Shekar

1 Answers

eliasah

Related questions

Recent Activity

Donate For Us