Does anyone know how to do performance profiling of all java code running in a Hadoop cluster?
I will explain on a simple example. If we do a local java development, we can run Yourkit to measure the % of CPU taken by each method of each class. We can see that class A calls method X and this takes 90% of execution time of the whole app, and then fix the inefficiency in the code.
But if we are doing a mapreduce job and run it in the cluster, I would also like to see what is sluggish: our map/reduce code, or the framework itself. So, I would like to have a service which gets the information about each class/method call and % of time for its execution, which gathers this somewhere into HDFS, and then to analyze the method calling tree with CPU consumption.
Question: does anyone know if such a solution exists?
P.S. Note: I understand that such a thing will slow down the cluster. And I understand that such thing should be done either on a test cluster or in agreement with the customer. The question now is "does there exist such a thing?". Thanks.
I solved the problem. Here http://ihorbobak.com/index.php/2015/08/05/cluster-profiling/ you may find the detailed instruction of how to do this.
Short summary how the profiling is done:
Flame Graphs were invented by Brendann Gregg http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html. There is a very good video by Brendan that explains how it works: https://www.youtube.com/watch?v=nZfNehCzGdw . There is also a very good book by this author “Systems Performance: Enterprise and The Cloud” which I highly recommend to read.
Sorry for bumping this old thread, but I feel this might be useful for other people as well.
We actually had a similar problem. One of our production jobs was producing a sub-optimal throughput without any indication why. Since we wanted to limit the dependencies on the clusternodes and sample different frameworks such as Spark, Hadoop, and even non-JVM based applications, we decided to build our own distributed profiler based on perf, and like Ihor, we are using FlameGraphs for visualization.
The software is currently in an alpha state (https://github.com/cerndb/Hadoop-Profiler), and currently only supports on-CPU profiling, but it already showed its potential when analyzing this job.
It basically works like this in a Hadoop context:
If you like, we did a more detailed write-up regarding this.
https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction
I hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With