Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Spark code be run on cluster without spark-submit?

I would like to develop a Scala application which connects a master and runs a spark piece of code. I would like to achieve this without using spark-submit. Is this possible? Particularly I would like to know if the following code can run from my machine and connect to a cluster:

val conf = new SparkConf()
  .setAppName("Meisam")
  .setMaster("yarn-client")

val sc = new SparkContext(conf)

val sqlContext = new SQLContext(sc)
val df = sqlContext.sql("SELECT * FROM myTable")

...
like image 605
Meisam Emamjome Avatar asked Nov 27 '15 11:11

Meisam Emamjome


People also ask

Can Spark only run on a cluster?

No, but if you run on a cluster, you will need some form of shared file system (for example, NFS mounted at the same path on each node). If you have this type of filesystem, you can just deploy Spark in standalone mode.

How does a Spark Program physically execute on a cluster?

A Spark program implicitly creates a logical directed acyclic graph (DAG) of operations. When the driver runs, it converts this logical graph into a physical execution plan. Here you can see that collect is an action that will collect all data and give a final result.

Can we run the Spark submit in local mode in cluster?

Using --master option, you specify what cluster manager to use to run your application. Spark currently supports Yarn, Mesos, Kubernetes, Stand-alone, and local.


3 Answers

add a conf

val conf = new SparkConf() .setAppName("Meisam") .setMaster("yarn-client") .set("spark.driver.host", "127.0.0.1");

like image 122
xfreewind Avatar answered Nov 07 '22 06:11

xfreewind


Yes, it's possible and basically what you did is all that's needed to have tasks running on YARN cluster in the client deploy mode (where the driver runs on the machine where the app runs).

spark-submit helps you to leave your code free of few SparkConf settings that are required for proper execution like master URL. When you keep your code free of the low-level details, you could deploy your Spark applications on any Spark cluster - YARN, Mesos, Spark Standalone and local - without recompiling them.

like image 22
Jacek Laskowski Avatar answered Nov 07 '22 07:11

Jacek Laskowski


As opposed to what has been said here, I think it's only partially possible, as I've recently discovered the hard way, being the Spark newbie that I am. While you can definitely connect to a cluster as noted above and run code on it, you may encounter problems when you start doing anything non-trivial, even something as simple as using UDF's (user-defined-function, AKA anything not already included in Spark). Have a look here https://issues.apache.org/jira/browse/SPARK-18075, and the other related tickets, and most importantly, at the responses. Also, this seems useful (having a look at it now): Submitting spark app as a yarn job from Eclipse and Spark Context

like image 38
Ido.Schwartzman Avatar answered Nov 07 '22 05:11

Ido.Schwartzman