Can Spark code be run on cluster without spark-submit?

Tags:

I would like to develop a Scala application which connects a master and runs a spark piece of code. I would like to achieve this without using spark-submit. Is this possible? Particularly I would like to know if the following code can run from my machine and connect to a cluster:

val conf = new SparkConf()
  .setAppName("Meisam")
  .setMaster("yarn-client")

val sc = new SparkContext(conf)

val sqlContext = new SQLContext(sc)
val df = sqlContext.sql("SELECT * FROM myTable")

...

605

asked Nov 27 '15 11:11

Meisam Emamjome

3 Answers

add a conf

val conf = new SparkConf() .setAppName("Meisam") .setMaster("yarn-client") .set("spark.driver.host", "127.0.0.1");

122

answered Nov 07 '22 06:11

xfreewind

Yes, it's possible and basically what you did is all that's needed to have tasks running on YARN cluster in the client deploy mode (where the driver runs on the machine where the app runs).

spark-submit helps you to leave your code free of few SparkConf settings that are required for proper execution like master URL. When you keep your code free of the low-level details, you could deploy your Spark applications on any Spark cluster - YARN, Mesos, Spark Standalone and local - without recompiling them.

answered Nov 07 '22 07:11

Jacek Laskowski

As opposed to what has been said here, I think it's only partially possible, as I've recently discovered the hard way, being the Spark newbie that I am. While you can definitely connect to a cluster as noted above and run code on it, you may encounter problems when you start doing anything non-trivial, even something as simple as using UDF's (user-defined-function, AKA anything not already included in Spark). Have a look here https://issues.apache.org/jira/browse/SPARK-18075, and the other related tickets, and most importantly, at the responses. Also, this seems useful (having a look at it now): Submitting spark app as a yarn job from Eclipse and Spark Context

answered Nov 07 '22 05:11

Ido.Schwartzman

Related questions
                            
                                How to convert pyspark.rdd.PipelinedRDD to Data frame with out using collect() method in Pyspark?
                            
                                Spark streaming with python: how to add a UUID column?
                            
                                Difference between batch interval, sliding interval and window size in spark streaming
                            
                                Failed to find data source: com.mongodb.spark.sql.DefaultSource
                            
                                Can I tell spark.read.json that my files are gzipped?
                            
                                How to use spark-avro package to read avro file from spark-shell?
                            
                                Enriching SparkContext without incurring in serialization issues
                            
                                spark reading large file
                            
                                Using Silhouette Clustering in Spark
                            
                                Convert value depending on a type in SparkSQL via case matching of type
                            
                                How to flatten nested lists in PySpark?
                            
                                How to force Spark to evaluate DataFrame operations inline
                            
                                Run Command on EMR Slaves?
                            
                                How does Spark manage stages?
                            
                                What row is used in dropDuplicates operator?
                            
                                Create an empty array column of certain type in pyspark DataFrame
                            
                                Ignoring non-spark config property: hive.exec.dynamic.partition.mode
                            
                                How to CREATE TABLE USING delta with Spark 2.4.4?
                            
                                Write and read raw byte arrays in Spark - using Sequence File SequenceFile
                            
                                How to check if Spark RDD is in memory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can Spark code be run on cluster without spark-submit?

Tags:

apache-spark

hadoop-yarn