Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: get number of cluster cores programmatically

I run my spark application in yarn cluster. In my code I use number available cores of queue for creating partitions on my dataset:

Dataset ds = ...
ds.coalesce(config.getNumberOfCores());

My question: how can I get number available cores of queue by programmatically way and not by configuration?

like image 506
Rougher Avatar asked Nov 20 '17 18:11

Rougher


People also ask

What is number of cores in Spark?

From basic math (X * Y= 15), we can see that there are four different executor & core combinations that can get us to 15 Spark cores per node: Possible configurations for executor.

How do I know if I have Spark executor cores?

The number of cores can be specified with the --executor-cores flag when invoking spark-submit, spark-shell, and pyspark from the command line, or by setting the spark. executor. cores property in the spark-defaults. conf file or on a SparkConf object.


2 Answers

According to Databricks if the driver and executors are of the same node type, this is the way to go:

java.lang.Runtime.getRuntime.availableProcessors * (sc.statusTracker.getExecutorInfos.length -1)
like image 119
zaxme Avatar answered Sep 18 '22 06:09

zaxme


Found this while looking for the answer to pretty much the same question.

I found that:

Dataset ds = ...
ds.coalesce(sc.defaultParallelism());

does exactly what the OP was looking for.

For example, my 5 node x 8 core cluster returns 40 for the defaultParallelism.

like image 31
Steve C Avatar answered Sep 22 '22 06:09

Steve C