Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What to use to have graphical view of Spark's memory usage (with YARN)?

I was going through one of the presentation on spark memory management and wanted to know how to get a good graphical picture of executor memory usage (something similar to what was mentioned in presentation), to understand out of memory errors better. Also, what is the best way to analyze off-heap memory usage in spark executors? How to find the amount of off-heap memory usage as a function of time?

I looked into Ganglia but it gives node level metrics. I found it hard to understand executor level memory usage using node level metrics.

like image 419
user401445 Avatar asked Sep 14 '16 09:09

user401445


People also ask

How do I check memory usage on Spark?

RM UI also displays the total memory per application. Spark UI - Checking the spark ui is not practical in our case. RM UI - Yarn UI seems to display the total memory consumption of spark app that has executors and driver.

How do I get Spark executor memory?

Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30. Leaving 1 executor for ApplicationManager => --num-executors = 29. Number of executors per node = 30/10 = 3. Memory per executor = 64GB/3 = 21GB.

What is stored in yarn memory overhead?

Memory overhead is the amount of off-heap memory allocated to each executor. By default, memory overhead is set to either 10% of executor memory or 384, whichever is higher.

How do I monitor my Spark application?

Click Analytics > Spark Analytics > Open the Spark Application Monitoring Page. Click Monitor > Workloads, and then click the Spark tab. This page displays the user names of the clusters that you are authorized to monitor and the number of applications that are currently running in each cluster.


1 Answers

I've been thinking about a similar tool!

I think org.apache.spark.scheduler.SparkListener is the interface to all the low-level metrics in Apache Spark with onExecutorMetricsUpdate being the method to look at when developing a higher-level monitoring tool.

You could also monitor JVM using JMX interface, but it might be too low-level and definitely without the contextual information on how Spark uses the resources.

like image 171
Jacek Laskowski Avatar answered Oct 14 '22 03:10

Jacek Laskowski