I would like to profile my sSpark scala applications to figure out the parts of the code which i have to optimize. I enabled -Xprof
in --driver-java-options
but this is not of much help to me as it gives lot of granular details. I am just interested to know how much time each function call in my application is taking time.
As is other Stack Overflow questions, many people suggested YourKit but it is not inexpensive. So i would like to use something which is not costly in fact free of cost.
Are there any better ways to solve this ?
The Profiling tool analyzes both CPU or GPU generated event logs and generates information which can be used for debugging and profiling Apache Spark applications. The output information contains the Spark version, executor details, properties, etc.
I would recommend you to use directly the UI that spark provides. It provides a lot of information and metrics regarding time, steps, network usage, etc...
You can check more about it here: https://spark.apache.org/docs/latest/monitoring.html
Also, in the new Spark version (1.4.0) there is a nice visualizer to understand the steps and stages of your spark jobs.
As you said, profiling a distributed process is trickier than profiling a single JVM process, but there are ways to achieve this.
You can use sampling as a thread profiling method. Add a java agent to the executors that will capture stack traces, then aggregate over these stack traces to see which methods your application spends the most time in.
For example, you can use Etsy's statsd-jvm-profiler java agent and configure it to send the stack traces to InfluxDB and then aggregate them using Flame Graphs.
For more information, check out my post on profiling Spark applications: https://www.paypal-engineering.com/2016/09/08/spark-in-flames-profiling-spark-applications-using-flame-graphs/
I've written an article and a script recently, that wraps spark-submit
, and generates a flame graph after executing a Spark application.
Here's the article: https://www.linkedin.com/pulse/profiling-spark-applications-one-click-michael-spector
Here's the script: https://raw.githubusercontent.com/spektom/spark-flamegraph/master/spark-submit-flamegraph
Just use it instead of regular spark-submit
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With