Spark - How to identify a failed Job through 'SparkLauncher'

I am using Spark 2.0 and sometimes my job fails due to problems with input. For example, I am reading CSV files off from a S3 folder based on the date, and if there's no data for the current date, my job has nothing to process so it throws an exception as follows. This gets printed in the driver's logs.

Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: s3n://data/2016-08-31/*.csv;
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:174)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
...
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 16/09/03 10:51:54 INFO SparkContext: Invoking stop() from shutdown hook
 16/09/03 10:51:54 INFO SparkUI: Stopped Spark web UI at http://192.168.1.33:4040
 16/09/03 10:51:54 INFO StandaloneSchedulerBackend: Shutting down all executors
 16/09/03 10:51:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
Spark App app-20160903105040-0007 state changed to FINISHED

However, despite this uncaught exception, my Spark Job status is 'FINISHED'. I would expect it to be in 'FAILED' status because there was an exception. Why is it marked as FINISHED? How can I find out whether the job failed or not?

Note: I am spawning the Spark jobs using SparkLauncher, and listening to state changes through AppHandle. But the state change I receive is FINISHED whereas I am expecting FAILED.

How can you recognize failure in your Spark job?

When a Spark job or application fails, you can use the Spark logs to analyze the failures. The QDS UI provides links to the logs in the Application UI and Spark Application UI. If you are running the Spark job or application from the Analyze page, you can access the logs via the Application UI and Spark Application UI.

How do I track a Spark job?

Click Analytics > Spark Analytics > Open the Spark Application Monitoring Page. Click Monitor > Workloads, and then click the Spark tab. This page displays the user names of the clusters that you are authorized to monitor and the number of applications that are currently running in each cluster.

What happens after Spark job is submitted?

Once you do a Spark submit, a driver program is launched and this requests for resources to the cluster manager and at the same time the main program of the user function of the user processing program is initiated by the driver program.

The one FINISHED you see is for Spark application not a job. It is FINISHED since the Spark context was able to start and stop properly.

You can see any job information using JavaSparkStatusTracker. For active jobs nothing additional should be done, since it has ".getActiveJobIds" method.

For getting finished/failed you will need to setup the job group ID in the thread from which you are calling for a spark execution:

JavaSparkContext sc;
... 
sc.setJobGroup(MY_JOB_ID, "Some description");

Then whenever you need, you can read the status of each job with in specified job group:

JavaSparkStatusTracker statusTracker = sc.statusTracker();
for (int jobId : statusTracker.getJobIdsForGroup(JOB_GROUP_ALL)) {
    final SparkJobInfo jobInfo = statusTracker.getJobInfo(jobId);
    final JobExecutionStatus status = jobInfo.status();
}

The JobExecutionStatus can be one of RUNNING, SUCCEEDED, FAILED, UNKNOWN; The last one is for case of job is submitted, but not actually started.

Note: all this is available from Spark driver, which is jar you are launching using SparkLauncher. So above code should be placed into the jar.

If you want to check in general is there any failures from the side of Spark Launcher, you can exit the application started by Jar with exit code different than 0 using kind of System.exit(1), if detected a job failure. The Process returned by SparkLauncher::launch contains exitValue method, so you can detect is it failed or no.

Spark - How to identify a failed Job through 'SparkLauncher'

Tags:

apache-spark

Yohan Liyanage

People also ask

Video Answer

1 Answers

Volodymyr Zubariev

Recent Activity

Donate For Us

Spark - How to identify a failed Job through 'SparkLauncher'

Tags:

apache-spark

Yohan Liyanage

People also ask

Video Answer

1 Answers

Volodymyr Zubariev

Related questions

Recent Activity

Donate For Us