Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the number of workers(executors) in PySpark?

I need to use this parameter, so how can I get the number of workers? Like in Scala, I can call sc.getExecutorMemoryStatus to get the available number of workers. But in PySpark, it seems there's no API exposed to get this number.

like image 923
American curl Avatar asked Jul 29 '16 14:07

American curl


2 Answers

In scala, getExecutorStorageStatus and getExecutorMemoryStatus both return the number of executors including driver. like below example snippet

/** Method that just returns the current active/registered executors
        * excluding the driver.
        * @param sc The spark context to retrieve registered executors.
        * @return a list of executors each in the form of host:port.
        */
       def currentActiveExecutors(sc: SparkContext): Seq[String] = {
         val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
         val driverHost: String = sc.getConf.get("spark.driver.host")
         allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
       }

But In python api it was not implemented

@DanielDarabos answer also confirms this.

The equivalent to this in python...

sc.getConf().get("spark.executor.instances")

Edit (python) :

%python
sc = spark._jsc.sc() 
n_workers =  len([executor.host() for executor in sc.statusTracker().getExecutorInfos() ]) -1

print(n_workers)

As Danny mentioned in the comment if you want to cross verify them you can use the below statements.

%python

sc = spark._jsc.sc() 

result1 = sc.getExecutorMemoryStatus().keys() # will print all the executors + driver available

result2 = len([executor.host() for executor in sc.statusTracker().getExecutorInfos() ]) -1

print(result1, end ='\n')
print(result2)

Example Result :

Set(10.172.249.9:46467)
0
like image 192
Ram Ghadiyaram Avatar answered Oct 03 '22 20:10

Ram Ghadiyaram


You can also get the number of executors by Spark REST API: https://spark.apache.org/docs/latest/monitoring.html#rest-api

You can check /applications/[app-id]/executors, which returns A list of all active executors for the given application.


PS: When spark.dynamicAllocation.enabled is true, spark.executor.instances may not equals to the current available executors, but this API always returns the correct value.

like image 20
Hunger Avatar answered Oct 03 '22 21:10

Hunger