we have a cluster which has about 20 nodes. This cluster is shared among many users and jobs. Therefore, it is very difficult for me to observe my job so that I can get some metrics such as CPU usage, I/O, Network, Memory etc...
How can I get a metrics on job level.
PS: The cluster already have Ganglia installed but not sure how I could get it to work on the job level. What I would like to do is monitor the resource used by the cluster to execute my job only.
You can get the spark job metrics from Spark History Server, which displays information about:
- A list of scheduler stages and tasks
- A summary of RDD sizes and memory usage
- A Environmental information
- A Information about the running executors
1, Set spark.eventLog.enabled to true before starting the spark application. This configures Spark to log Spark events to persisted storage.
2, Set spark.history.fs.logDirectory, this is the directory that contains application event logs to be loaded by the history server;
3, Start the history server by executing: ./sbin/start-history-server.sh
please refer to below link for more information:
http://spark.apache.org/docs/latest/monitoring.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With