Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I tell if my spark job is progressing?

I have a spark job running on YARN and it appears to just hang and not be doing any computation.

Here's what yarn says when I do yarn application -status <APPLICATIOM ID>:

Application Report : 
Application-Id : applicationID
Application-Name : test app
Application-Type : SPARK
User : ec2-user
Queue : default
Start-Time : 1491005660004
Finish-Time : 0
Progress : 10%
State : RUNNING
Final-State : UNDEFINED
Tracking-URL : http://<ip>:4040
RPC Port : 0
AM Host : <host ip>
Aggregate Resource Allocation : 36343926 MB-seconds, 9818 vcore-seconds
Log Aggregation Status : NOT_START
Diagnostics : 

And, when I check the yarn application -list it says that it is RUNNING. But I'm not sure I trust that. When I go to the spark webUI, I see only one stage the entire few hours I've been running it:

web UI

Also, when I click on the "Stages" tab, I see nothing running:

Stages tab

How do ensure that my application is actually running and that YARN is not lying to me?

I would actually prefer for this to throw an error rather than keep me waiting to see if the job is actaully runing. How do I do that?

like image 665
makansij Avatar asked Apr 01 '17 02:04

makansij


People also ask

How do I know if my Spark job is progressing?

You can view the status of a Spark Application that is created for the notebook in the status widget on the notebook panel. The widget also displays links to the Spark UI, Driver Logs, and Kernel Log. Additionally, you can view the progress of the Spark job when you run the code.

Why is my Spark job taking so long?

Sometimes, Spark runs slowly because there are too many concurrent tasks running. The capacity for high concurrency is a beneficial feature, as it provides Spark-native fine-grained sharing. This leads to maximum resource utilization while cutting down query latencies.

How is a Spark job split into stages?

A job comprises several stages. When Spark encounters a function that requires a shuffle it creates a new stage. Transformation functions like reduceByKey(), Join() etc will trigger a shuffle and will result in a new stage. Spark will also create a stage when you are reading a dataset.


2 Answers

On the spark application UI

If you click on the link : "parquet at Nativexxxx" it would show you Details for the running stage.

On that screen there would be a column "Input Size/Records". If your job is progressing the number shown in that column would change.

Image shows what I am trying to say

It basically depicts number of records read by your executor.

like image 186
Sanchit Grover Avatar answered Oct 15 '22 19:10

Sanchit Grover


if you go to Spark UI and search for "executors" tab. There you will have the list executors that your job is running on and next to executor ID and address you will have "logs" column there you will have "stdout" & "stderr" tabs. Click on stdout and there you can see the logs those were written on your container when your job is running.

like image 2
BadBoy777 Avatar answered Oct 15 '22 18:10

BadBoy777