Job 65 cancelled because SparkContext was shut down

Question

I'm working on a shared Apache Zeppelin server. Almost every day, I try to run a command and get this error: Job 65 cancelled because SparkContext was shut down

I would love to learn more about what causes the SparkContext to shut down. My understanding is Zeppelin is a kube app that sends commands to a machine for the processing.

When a SparkContext shuts down, does that mean my bridge to the Spark cluster is down? And, if that's the case, how can I cause the bridge to the spark cluster to go down?

In this example, it happened when I was trying to upload data to S3.

This is the code

val myfiles = readParquet(
    startDate=ew LocalDate(2020, 4, 1),
    endDate=ew LocalDate(2020, 4, 7)
)

log_events.createOrReplaceTempView("log_events")

val mySQLDF = spark.sql(s"""
    select [6 columns]
    from myfiles 
    join [other table]
    on [join_condition]
"""
)

mySQLDF.write.option("maxRecordsPerFile", 1000000).parquet(path)
// mySQLDF has 3M rows and they're all strings or dates

This is the stacktrace error

org.apache.spark.SparkException: Job aborted.
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
  at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
  at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566)
  ... 48 elided
Caused by: org.apache.spark.SparkException: Job 44 cancelled because SparkContext was shut down
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:972)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:970)
  at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
  at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:970)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:2286)
  at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)
  at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2193)
  at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1949)
  at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
  at org.apache.spark.SparkContext.stop(SparkContext.scala:1948)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:121)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167)
  ... 70 more

Ani Menon · Accepted Answer

Your job is getting aborted at the write step. Job aborted. is the exception message for that, which is leading to the Spark Context being shutdown.

Look into optimising the write step, maxRecordsPerFile might be the culprit; maybe try a lower number.. you currently have 1M records in a file!

In general, Job ${job.jobId} cancelled because SparkContext was shut down just means that it's an exception due to which the DAG couldn't continue and needs to Error out. Its the Spark scheduler throwing this error when it faces an exception, it might be an exception that is unhandled in your code or a job failure due to any other reason. And as the DAG scheduler is stopped, the entire application will get stopped(this message is part of Cleanup).

To your questions -

When a SparkContext shuts down, does that mean my bridge to the Spark cluster is down?

SparkContext represents the connection to a Spark cluster, so if its dead it means you can't run run job on to it as you lost the link! On Zepplin, you can just restart the SparkContext (Menu -> Interpreter -> Spark Interpreter -> restart)

And, if that's the case, how can I cause the bridge to the spark cluster to go down?

With SparkException/Error in Jobs or manually using sc.stop()

Job 65 cancelled because SparkContext was shut down

Tags:

apache-spark

apache-spark-sql

hadoop

pyspark

apache-zeppelin

Cauder

1 Answers

Ani Menon

Recent Activity

Donate For Us

Job 65 cancelled because SparkContext was shut down

Tags:

apache-spark

apache-spark-sql

hadoop

pyspark

apache-zeppelin

Cauder

1 Answers

Ani Menon

Related questions

Recent Activity

Donate For Us