Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change job/stage description in web UI?

Tags:

When I run a job on Apache Spark, the web UI gives a view similar to this:

enter image description here

While this is incredibly useful for me as a developer to see where things are, I think the line numbers in the stage description would be not quite as useful for my support team. To make their job easier, I would like to have the ability to provide a bespoke name for each stage of my job, as well as for the job itself, like so:

enter image description here

Is this something that can be done in Spark? If so, how would I do so?

like image 267
Joe C Avatar asked Jan 27 '17 21:01

Joe C


People also ask

What is job stage in Spark?

Job. A job comprises several stages. When Spark encounters a function that requires a shuffle it creates a new stage. Transformation functions like reduceByKey(), Join() etc will trigger a shuffle and will result in a new stage. Spark will also create a stage when you are reading a dataset.

What are some of the things you can monitor in the Spark Web UI?

Apache Spark provides a suite of Web UI/User Interfaces (Jobs, Stages, Tasks, Storage, Environment, Executors, and SQL) to monitor the status of your Spark/PySpark application, resource consumption of Spark cluster, and Spark configurations.

How do you keep the Spark Web UI alive?

The web UI is intrinsically tied to the SparkContext , so if you do not call . stop and keep your application alive, then the UI should remain alive. If you need to view the logs, then those should still be persisted to the server, though.


2 Answers

That's where one of the very uncommon features of Spark Core called local properties applies so well.

Spark SQL uses it to group different Spark jobs under a single structured query so you can use SQL tab and navigate easily.

You can control local properties using SparkContext.setLocalProperty:

Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool. User-defined properties may also be set here. These properties are propagated through to worker tasks and can be accessed there via org.apache.spark.TaskContext#getLocalProperty.

web UI uses two local properties:

  • callSite.short in Jobs tab (and is exactly what you want)
  • callSite.long in Job Details page.

Sample Usage

scala> sc.setLocalProperty("callSite.short", "callSite.short")  scala> sc.setLocalProperty("callSite.long", "this is callSite.long")  scala> sc.parallelize(0 to 9).count res2: Long = 10 

And the result in web UI.

Jobs tab in web UI with callSite.short

Click a job to see the details where you can find the longer call site, i.e. callSite.long.

Job details in web UI with callSite.long

Here comes the Stages tab.

Stages tab in web UI

like image 164
Jacek Laskowski Avatar answered Jan 09 '23 02:01

Jacek Laskowski


You can use the following API(s) to set and unset the stage names. https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#setCallSite-java.lang.String- https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#clearCallSite--

Also, Spark supports the concept of Job Groups within the application, following API(s) can be used to set and unset the job group names. https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#setJobGroup-java.lang.String-java.lang.String-boolean- https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#clearJobGroup--

The job description within the job group can also be configured using following API. https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#setJobDescription-java.lang.String-

like image 22
Anuj Avatar answered Jan 09 '23 04:01

Anuj