Let's suppose we have a spark driver program written like this:
public class SimpleApp {
public static void main(String[] args) {
String logFile = "YOUR_SPARK_HOME/README.md"; // Should be some file on your system
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("a"); }
}).count();
long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("b"); }
}).count();
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
}
}
and I want to run in a yarn cluster, can I avoid using spark-submit and (supposing of course I have access to one cluster node ), by just specifying in the context I want to run on yarn? In other words, is it possible to launch the spark client as a regular java app leveraging yarn?
Use spark://HOST:PORT for Standalone cluster, replace the host and port of stand-alone cluster. Use local to run locally with a one worker thread. Use local[k] and specify k with the number of cores you have locally, this runs application with k worker threads.
Spark application, using spark-submit, is a shell command used to deploy the Spark application on a cluster.
What happens when a Spark Job is submitted? When a client submits a spark user application code, the driver implicitly converts the code containing transformations and actions into a logical directed acyclic graph (DAG).
Deploy mode specifies the location of where driver executes in the deployment environment. Deploy mode can be one of the following options: client (default) - the driver runs on the machine that the Spark application was launched. cluster - the driver runs on a random node in a cluster.
Here is another official way to do it.
Spark Launcher - Library for launching Spark applications.
This library allows applications to launch Spark programmatically. There's only one entry point to the library - the SparkLauncher class.
To launch a Spark application, just instantiate a SparkLauncher and configure the application to run. For example:
import org.apache.spark.launcher.SparkLauncher;
public class MyLauncher {
public static void main(String[] args) throws Exception {
Process spark = new SparkLauncher()
.setAppResource("/my/app.jar")
.setMainClass("my.spark.app.Main")
.setMaster("local")
.setConf(SparkLauncher.DRIVER_MEMORY, "2g")
.launch();
spark.waitFor();
}
}
You can set all the YARN specific config using setConf method and set the master to yarn-client or yarn-cluster
References: https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/launcher/package-summary.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With