The Spark Web UI shows some interesting information about the resources available to the cluster as a whole. <img src="https://i.stack.imgur.com/0mGnl.png" alt="Spark Web UI"> I'm specifically interested in the values for: <ul> <li>Workers</li> <li>Cores</li> <li>Memory</li> </ul> How can I query these pieces of information about the overall cluster programmatically?

Spark doesn't really expose this kind of info, it's all hidden in the Master and transferred to the WebUI. You can however use a small hack, the WebUI supports JSON by appending /json/ to a page. So, going to <code>http://<master-host>:<master-port>/json/</code> will return just the info you're looking for: <pre class="prettyprint"><code>{ url: "spark://<host>:<port>", workers: [ ], cores: 0, coresused: 0, memory: 0, memoryused: 0, activeapps: [ ], completedapps: [ ], activedrivers: [ ], status: "ALIVE" } </code></pre>

Programmatically determine number of cores and amount of memory available to Spark

The Spark Web UI shows some interesting information about the resources available to the cluster as a whole.

Spark Web UI

I'm specifically interested in the values for:

Workers
Cores
Memory

How can I query these pieces of information about the overall cluster programmatically?

How does Spark calculate number of cores?

According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.

How do you determine the number of executors and memory in Spark?

It is JVM process which runs on a worker node. Executor runs tasks and keeps data in memory or disk storage across them. Each application has its own executors. The number of executors can be specified inside the SparkConf or via the flag –num-executors command-line.

What is number of cores in Spark?

The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.

Spark doesn't really expose this kind of info, it's all hidden in the Master and transferred to the WebUI.

You can however use a small hack, the WebUI supports JSON by appending /json/ to a page.

So, going to http://<master-host>:<master-port>/json/ will return just the info you're looking for:

{
  url: "spark://<host>:<port>",
  workers: [ ],
  cores: 0,
  coresused: 0,
  memory: 0,
  memoryused: 0,
  activeapps: [ ],
  completedapps: [ ],
  activedrivers: [ ],
  status: "ALIVE"
}

Programmatically determine number of cores and amount of memory available to Spark

Tags:

apache-spark

Nick Chammas

People also ask

1 Answers

Marius Soutier

Recent Activity

Donate For Us

Programmatically determine number of cores and amount of memory available to Spark

Tags:

apache-spark

Nick Chammas

People also ask

1 Answers

Marius Soutier

Related questions

Recent Activity

Donate For Us