Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Profiling a Scala Spark application

I would like to profile my sSpark scala applications to figure out the parts of the code which i have to optimize. I enabled -Xprof in --driver-java-options but this is not of much help to me as it gives lot of granular details. I am just interested to know how much time each function call in my application is taking time. As is other Stack Overflow questions, many people suggested YourKit but it is not inexpensive. So i would like to use something which is not costly in fact free of cost.

Are there any better ways to solve this ?

like image 388
svKris Avatar asked Jun 17 '15 18:06

svKris


People also ask

What is profiling in spark?

The Profiling tool analyzes both CPU or GPU generated event logs and generates information which can be used for debugging and profiling Apache Spark applications. The output information contains the Spark version, executor details, properties, etc.


3 Answers

I would recommend you to use directly the UI that spark provides. It provides a lot of information and metrics regarding time, steps, network usage, etc...

You can check more about it here: https://spark.apache.org/docs/latest/monitoring.html

Also, in the new Spark version (1.4.0) there is a nice visualizer to understand the steps and stages of your spark jobs.

like image 135
hveiga Avatar answered Nov 16 '22 16:11

hveiga


As you said, profiling a distributed process is trickier than profiling a single JVM process, but there are ways to achieve this.

You can use sampling as a thread profiling method. Add a java agent to the executors that will capture stack traces, then aggregate over these stack traces to see which methods your application spends the most time in.

For example, you can use Etsy's statsd-jvm-profiler java agent and configure it to send the stack traces to InfluxDB and then aggregate them using Flame Graphs.

For more information, check out my post on profiling Spark applications: https://www.paypal-engineering.com/2016/09/08/spark-in-flames-profiling-spark-applications-using-flame-graphs/

like image 26
aviemzur Avatar answered Nov 16 '22 16:11

aviemzur


I've written an article and a script recently, that wraps spark-submit, and generates a flame graph after executing a Spark application.

Here's the article: https://www.linkedin.com/pulse/profiling-spark-applications-one-click-michael-spector

Here's the script: https://raw.githubusercontent.com/spektom/spark-flamegraph/master/spark-submit-flamegraph

Just use it instead of regular spark-submit.

like image 7
Michael Spector Avatar answered Nov 16 '22 17:11

Michael Spector