Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically determine number of cores and amount of memory available to Spark

Tags:

apache-spark

The Spark Web UI shows some interesting information about the resources available to the cluster as a whole.

Spark Web UI

I'm specifically interested in the values for:

  • Workers
  • Cores
  • Memory

How can I query these pieces of information about the overall cluster programmatically?

like image 735
Nick Chammas Avatar asked Apr 14 '15 23:04

Nick Chammas


People also ask

How does Spark calculate number of cores?

According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.

How do you determine the number of executors and memory in Spark?

It is JVM process which runs on a worker node. Executor runs tasks and keeps data in memory or disk storage across them. Each application has its own executors. The number of executors can be specified inside the SparkConf or via the flag –num-executors command-line.

What is number of cores in Spark?

The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.


1 Answers

Spark doesn't really expose this kind of info, it's all hidden in the Master and transferred to the WebUI.

You can however use a small hack, the WebUI supports JSON by appending /json/ to a page.

So, going to http://<master-host>:<master-port>/json/ will return just the info you're looking for:

{
  url: "spark://<host>:<port>",
  workers: [ ],
  cores: 0,
  coresused: 0,
  memory: 0,
  memoryused: 0,
  activeapps: [ ],
  completedapps: [ ],
  activedrivers: [ ],
  status: "ALIVE"
}
like image 93
Marius Soutier Avatar answered Oct 29 '22 00:10

Marius Soutier