Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to submit a Scala job to Spark?

I have a Pythons script that I was able to submit to Spark in the following way:

/opt/spark/bin/spark-submit --master yarn-client test.py

Now, I try to submit a Scala program in the same way:

/opt/spark/bin/spark-submit --master yarn-client test.scala

As a result I get the following error message:

Error: Cannot load main class from JAR file:/home/myname/spark/test.scala
Run with --help for usage help or --verbose for debug output

The Scala program itself is just a Hello World program:

object HelloWorld {
    def main(args: Array[String]): Unit = {
        println("Hello, world!")
    }
}

What am I doing wrong?

like image 347
Roman Avatar asked Jan 08 '16 11:01

Roman


2 Answers

For starters you'll have to create a jar file. You cannot simply submit Scala source. If in doubt see Getting Started with sbt.

After that just add a class parameter pointing to the HelloWorld. Assuming no packages:

/opt/spark/bin/spark-submit --master yarn-client --class "HelloWorld" path_to.jar
like image 158
3 revs Avatar answered Nov 02 '22 20:11

3 revs


It depends on cluster mode you are using.

Have a look at generic command

./bin/spark-submit \
  --class <main-class>
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

For yarn-client,

/opt/spark/bin/spark-submit \
  --class "HelloWorld" your_jar_with_scala_file \
  --master yarn-client

Have a look at Spark documentation for better understanding.

like image 41
Ravindra babu Avatar answered Nov 02 '22 19:11

Ravindra babu