How to use the programmatic spark submit capability

Tags:

apache-spark

There is a somewhat recent (Spring 2015) feature apparently intended to allow submitting a spark job programmatically.

Here is the JIRA https://issues.apache.org/jira/browse/SPARK-4924

However there is uncertainty (and count me as well) about how to actually use these features. Here are the last comments in the jira:

enter image description here

When asking the actual author of this work to further explain it is "look in the API docs".

The "user document" is the Spark API documentation.

The author did not provide further details and apparently feels the whole issue were self explanatory. If anyone can connect the dots here: specifically - where in the API docs is this newer Spark Submit capability described - it would be appreciated.

Here is some of the info I am looking for -Pointers to the following:

What capabilities have been added to the spark api
How do we use them
Any examples / other relevant documentation and/or code

Update The SparkLauncher referred to in the accepted answer does launch a simple app under trivial ( master=local[*]) conditions. It remains to be seen how usable it will be on an actual cluster. After adding a print statement to the linked code:

println("launched.. and waiting..") spark.waitFor()

We do see:

launched.. and waiting..

Well this is probably a small step forward. Will update this question as I move towards a real clustered environment.

252

asked May 15 '16 20:05

WestCoastProjects

1 Answers

Looking at the details of the pull request, it seems that the functionality is provided by the SparkLauncher class, described in the API docs here.

public class SparkLauncher extends Object

Launcher for Spark applications.

Use this class to start Spark applications programmatically. The class uses a builder pattern to allow clients to configure the Spark application and launch it as a child process.

The API docs are rather minimal, but I found a blog post that gives a worked example (code also available in a GitHub repo). I have copied a simplified version of the example below (untested) in case the links go stale:

import org.apache.spark.launcher.SparkLauncher

object Launcher extends App {
  val spark = new SparkLauncher()
    .setSparkHome("/home/user/spark-1.4.0-bin-hadoop2.6")
    .setAppResource("/home/user/example-assembly-1.0.jar")
    .setMainClass("MySparkApp")
    .setMaster("local[*]")
    .launch();
  spark.waitFor();
}

DNA

Related questions
                            
                                New behavior in Scala 2.10
                            
                                What is the use of unapply in this sample Play framework code (Form mapping)
                            
                                Aliasing this in scala with self =>
                            
                                How to use takeWhile with an Iterator in Scala
                            
                                spray-json and list marshalling
                            
                                Scala Garbage Collection ?
                            
                                Take nth element of an Iterator
                            
                                what is Case.Aux in shapeless
                            
                                Easiest form of process communication in Scala
                            
                                Mllib dependency error
                            
                                How to require typesafe constant-size array in scala?
                            
                                How to resolve this exception
                            
                                Why is the actor "ask" pattern considered an anti-pattern or "code smell?"
                            
                                How can I get a Random URL on http request for Gatling?
                            
                                `Right 5` in Haskell and Scala
                            
                                Reading from Cassandra using Spark Streaming
                            
                                Converting a Scala Iterable[tuple] to RDD
                            
                                could not find implicit value for parameter e
                            
                                How do I put a case class in an rdd and have it act like a tuple(pair)?
                            
                                Using S3 (Frankfurt) with Spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use the programmatic spark submit capability

Tags:

scala

apache-spark

WestCoastProjects

People also ask

1 Answers

DNA

Recent Activity

Donate For Us