Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark-submit not working when application jar is in hdfs

I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied my application jar to a directory in hdfs, i get the following exception:

Warning: Skip remote jar hdfs://localhost:9000/user/hdfs/jars/simple-project-1.0-SNAPSHOT.jar. java.lang.ClassNotFoundException: com.example.SimpleApp

Here's the command:

$ ./bin/spark-submit --class com.example.SimpleApp --master local hdfs://localhost:9000/user/hdfs/jars/simple-project-1.0-SNAPSHOT.jar

I'm using hadoop version 2.6.0, spark version 1.2.1

like image 693
dilm Avatar asked Feb 26 '15 10:02

dilm


People also ask

Is Spark submit blocking?

Select Enable Blocking to have the Spark Submit entry wait until the Spark job finishes running. If this option is not selected, the Spark Submit entry proceeds with its execution once the Spark job is submitted for execution. Blocking is enabled by default. We support the yarn-cluster and yarn-client modes.

What are the important requirements for Spark submit?

Memory to be used by the Spark driver. The total number of executors to use. Amount of memory to use for the executor process. Number of CPU cores to use for the executor process.


1 Answers

The only way it worked for me, when I was using

--master yarn-cluster

like image 163
Romain Avatar answered Sep 30 '22 10:09

Romain