Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark streaming job fails after getting stopped by Driver

I have a spark streaming job which reads in data from Kafka and does some operations on it. I am running the job over a yarn cluster, Spark 1.4.1, which has two nodes with 16 GB RAM each and 16 cores each.

I have these conf passed to the spark-submit job :

--master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 3

The job returns this error and finishes after running for a short while :

INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 11,
(reason: Max number of executor failures reached)

.....

ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0:
Stopped by driver

Updated :

These logs were found too :

INFO yarn.YarnAllocator: Received 3 containers from YARN, launching executors on 3 of them.....

INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down.

....

INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.

INFO yarn.ExecutorRunnable: Starting Executor Container.....

INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down...

INFO yarn.YarnAllocator: Completed container container_e10_1453801197604_0104_01_000006 (state: COMPLETE, exit status: 1)

INFO yarn.YarnAllocator: Container marked as failed: container_e10_1453801197604_0104_01_000006. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e10_1453801197604_0104_01_000006
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
    at org.apache.hadoop.util.Shell.run(Shell.java:487)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

What might be the reasons for this? Appreciate some help.

Thanks

like image 819
void Avatar asked Feb 01 '16 06:02

void


People also ask

What happens when Spark driver fails?

The driver node's purpose is to run the Spark Streaming application; however, if it fails, the SparkContent will be lost, and the executors will be unable to access any in-memory data.

Is Spark streaming deprecated?

Now that the Direct API of Spark Streaming (we currently have version 2.3. 2) is deprecated and we recently added the Confluent platform (comes with Kafka 2.2. 0) to our project we plan to migrate these applications.

Why does Spark streaming not offer fault tolerance for Dstreams?

Since Spark Streaming is built on Spark, it enjoys the same fault-tolerance for worker nodes. However, the demand of high uptimes of a Spark Streaming application require that the application also has to recover from failures of the driver process, which is the main application process that coordinates all the workers.


1 Answers

can you please show your scala/java code that is reading from kafka? I suspect you probably not creating your SparkConf correctly.

Try something like

SparkConf sparkConf = new SparkConf().setAppName("ApplicationName");

also try running application in yarn-client mode and share the output.

like image 104
Faisal Ahmed Siddiqui Avatar answered Sep 29 '22 06:09

Faisal Ahmed Siddiqui