I am working on a project where in I have to tune spark's performance. I have found four most important parameters that will help in tuning spark's performance. They are as follows:
I wanted to know whether I am going in the right direction or not? Please let me know if I missed out on some other parameters also.
Thanks in advance.
This is is quite broad to answer honestly. The right path to optimize performance is mainly described in the official documentation in the section concerning Tuning Spark.
Generally speaking, there is lots of factors to optimize spark jobs :
It's mainly centralized around data serialization, memory tuning and a trade-off between precision/approximation techniques to get the job done fast.
EDIT:
Courtesy of @zero323 :
I'd point out, that all but one option mentioned in the question, are deprecated and used only in legacy mode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With