I am working with a scala program using spark packages. Currently I run the program using the bash command from the gateway: /homes/spark/bin/spark-submit --master yarn-cluster --class "com.xxx.yyy.zzz" --driver-java-options "-Dyyy.num=5" a.jar arg1 arg2 I would like to start using oozie for running this job. I have a few setbacks: Where should I put the spark-submit executable? on the hfs? How do I define the spark action? where should the --driver-java-options appear? How should the oozie action look like? is it similar to the one appearing here?

If you have a new enough version of oozie you can use oozie's spark task: https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd Otherwise you need to execute a java task that will call spark. Something like: <pre class="prettyprint"><code> <java> <main-class>org.apache.spark.deploy.SparkSubmit</main-class> <arg>--class</arg> <arg>${spark_main_class}</arg> -> this is the class com.xxx.yyy.zzz <arg>--deploy-mode</arg> <arg>cluster</arg> <arg>--master</arg> <arg>yarn</arg> <arg>--queue</arg> <arg>${queue_name}</arg> -> depends on your oozie config <arg>--num-executors</arg> <arg>${spark_num_executors}</arg> <arg>--executor-cores</arg> <arg>${spark_executor_cores}</arg> <arg>${spark_app_file}</arg> -> jar that contains your spark job, written in scala <arg>${input}</arg> -> some arg <arg>${output}</arg>-> some other arg <file>${spark_app_file}</file> <file>${name_node}/user/spark/share/lib/spark-assembly.jar</file> </java> </code></pre>

launching a spark program using oozie workflow

Tags:

scala

workflow

apache-spark

oozie

I am working with a scala program using spark packages. Currently I run the program using the bash command from the gateway: /homes/spark/bin/spark-submit --master yarn-cluster --class "com.xxx.yyy.zzz" --driver-java-options "-Dyyy.num=5" a.jar arg1 arg2

I would like to start using oozie for running this job. I have a few setbacks:

Where should I put the spark-submit executable? on the hfs? How do I define the spark action? where should the --driver-java-options appear? How should the oozie action look like? is it similar to the one appearing here?

492

asked Mar 24 '15 13:03

Shaharg

1 Answers

If you have a new enough version of oozie you can use oozie's spark task:

https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd

Otherwise you need to execute a java task that will call spark. Something like:

   <java>
        <main-class>org.apache.spark.deploy.SparkSubmit</main-class>

        <arg>--class</arg>
        <arg>${spark_main_class}</arg> -> this is the class com.xxx.yyy.zzz

        <arg>--deploy-mode</arg>
        <arg>cluster</arg>

        <arg>--master</arg>
        <arg>yarn</arg>

        <arg>--queue</arg>
        <arg>${queue_name}</arg> -> depends on your oozie config

        <arg>--num-executors</arg>
        <arg>${spark_num_executors}</arg>

        <arg>--executor-cores</arg>
        <arg>${spark_executor_cores}</arg>

        <arg>${spark_app_file}</arg> -> jar that contains your spark job, written in scala

        <arg>${input}</arg> -> some arg 
        <arg>${output}</arg>-> some other arg

        <file>${spark_app_file}</file>

        <file>${name_node}/user/spark/share/lib/spark-assembly.jar</file>
    </java>

answered Sep 21 '22 16:09

nurieta

Related questions
                            
                                How to achieve this scala operation in java efficiently
                            
                                Universal/generic boxing from Any to AnyRef
                            
                                Scala Boolean to String conversions
                            
                                Apply a list of parameters to a list of functions
                            
                                Simple use of Scala collections from Java not compiling with 2.11
                            
                                Configuration issue for Spray https server with self-signed certificate?
                            
                                Scalatest: how to check if a collection contains element that satisfies certain criteria
                            
                                How to configure ScalaTest to abort a suite if a test fails?
                            
                                Why does mapping over an HList of Option[T] not work?
                            
                                What are the options to use JDBC in a non-blocking way in Play?
                            
                                How to transform an HList to another HList with foldRight/foldLeft
                            
                                Does Scala IDE has *.sbt syntax highlighting support?
                            
                                How to create a singleton object in Scala with runtime params
                            
                                error while loading CharSequence in Scala 2.11.4 and sbt 0.12.4
                            
                                Parsing and manipulating json in Scala
                            
                                Cannot find JsonWriter or JsonFormat type class for a case class
                            
                                reduceByKey using Scala object as key
                            
                                spray Collection ToResponseMarshallable
                            
                                IntelliJ source code editor shows false compilation errors
                            
                                Access Request Body in essential filter Play Framework 2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With