The Spark Web UI shows some interesting information about the resources available to the cluster as a whole.
I'm specifically interested in the values for:
How can I query these pieces of information about the overall cluster programmatically?
According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.
It is JVM process which runs on a worker node. Executor runs tasks and keeps data in memory or disk storage across them. Each application has its own executors. The number of executors can be specified inside the SparkConf or via the flag –num-executors command-line.
The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.
Spark doesn't really expose this kind of info, it's all hidden in the Master and transferred to the WebUI.
You can however use a small hack, the WebUI supports JSON by appending /json/ to a page.
So, going to http://<master-host>:<master-port>/json/
will return just the info you're looking for:
{
url: "spark://<host>:<port>",
workers: [ ],
cores: 0,
coresused: 0,
memory: 0,
memoryused: 0,
activeapps: [ ],
completedapps: [ ],
activedrivers: [ ],
status: "ALIVE"
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With