I have a spark job running on YARN
and it appears to just hang and not be doing any computation.
Here's what yarn says when I do yarn application -status <APPLICATIOM ID>
:
Application Report :
Application-Id : applicationID
Application-Name : test app
Application-Type : SPARK
User : ec2-user
Queue : default
Start-Time : 1491005660004
Finish-Time : 0
Progress : 10%
State : RUNNING
Final-State : UNDEFINED
Tracking-URL : http://<ip>:4040
RPC Port : 0
AM Host : <host ip>
Aggregate Resource Allocation : 36343926 MB-seconds, 9818 vcore-seconds
Log Aggregation Status : NOT_START
Diagnostics :
And, when I check the yarn application -list
it says that it is RUNNING
. But I'm not sure I trust that. When I go to the spark webUI, I see only one stage the entire few hours I've been running it:
Also, when I click on the "Stages" tab, I see nothing running:
How do ensure that my application is actually running and that YARN
is not lying to me?
I would actually prefer for this to throw an error rather than keep me waiting to see if the job is actaully runing. How do I do that?
You can view the status of a Spark Application that is created for the notebook in the status widget on the notebook panel. The widget also displays links to the Spark UI, Driver Logs, and Kernel Log. Additionally, you can view the progress of the Spark job when you run the code.
Sometimes, Spark runs slowly because there are too many concurrent tasks running. The capacity for high concurrency is a beneficial feature, as it provides Spark-native fine-grained sharing. This leads to maximum resource utilization while cutting down query latencies.
A job comprises several stages. When Spark encounters a function that requires a shuffle it creates a new stage. Transformation functions like reduceByKey(), Join() etc will trigger a shuffle and will result in a new stage. Spark will also create a stage when you are reading a dataset.
On the spark application UI
If you click on the link : "parquet at Nativexxxx" it would show you Details for the running stage.
On that screen there would be a column "Input Size/Records". If your job is progressing the number shown in that column would change.
It basically depicts number of records read by your executor.
if you go to Spark UI and search for "executors" tab. There you will have the list executors that your job is running on and next to executor ID and address you will have "logs" column there you will have "stdout" & "stderr" tabs. Click on stdout and there you can see the logs those were written on your container when your job is running.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With