When a new pyspark application is started it creates a nice web UI with tabs for Jobs, Stages, Executors, etc. If I go to Executors tab I can see the full list of executors and some information about each executor - such as number of cores, storage memory used vs total, etc.
My question is if I can somehow access same information (or at least part of it) from the application itself programmatically, e.g. with something looking like spark.sparkContext.<function_name_to_get_info_about_executors>()
?
I've found some workaround with doing url request in a way similar to webUI, but I think that maybe I'm missing a simpler solution.
I'm using Spark 3.0.0
The only way I found so far seems hacky to me and involves scraping same url as what web UI querying, i.e. doing this:
import urllib.request
import json
sc = spark.sparkContext
u = sc.uiWebUrl + '/api/v1/applications/' + sc.applicationId + '/allexecutors'
with urllib.request.urlopen(u) as url:
executors_data = json.loads(url.read().decode())
Another option is to implement a SparkListener
which would override some/all onExecutor...()
methods depending on your needs, and then add it during spark-submit using --conf spark.extraListeners=<your listener class>
.
Your own solution is totally legit too, it just utilizes Spark's REST API.
Both are going to be quite involved, so pick your poison -- parse long JSons or go through a hierarchy of Developer API objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With