Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does setMaster `local[*]` mean in spark?

I found some code to start spark locally with:

val conf = new SparkConf().setAppName("test").setMaster("local[*]") val ctx = new SparkContext(conf) 

What does the [*] mean?

like image 280
Freewind Avatar asked Sep 02 '15 14:09

Freewind


People also ask

What is local * In Spark?

Local[*] is used to run spark locally with as many worker threads as logical cores on your machine. The code you have given will specify a master url for a local to run run locally with all the threads on your machine.

How do I run a local script in Spark?

Spark environment provides a command to execute the application file, be it in Scala or Java(need a Jar format), Python and R programming file. The command is, $ spark-submit --master <url> <SCRIPTNAME>. py .

What is master URL in Spark?

Just check http://master:8088 where master is pointing to spark master machine. There you will be able to see spark master URI, and by default is spark://master:7077, actually quite a bit of information lives there, if you have a spark standalone cluster.

How do I run a local Spark cluster?

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor. You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.


2 Answers

From the doc:

./bin/spark-shell --master local[2] 

The --master option specifies the master URL for a distributed cluster, or local to run locally with one thread, or local[N] to run locally with N threads. You should start by using local for testing.

And from here:

local[*] Run Spark locally with as many worker threads as logical cores on your machine.

like image 79
ccheneson Avatar answered Oct 19 '22 15:10

ccheneson


Master URL Meaning


local : Run Spark locally with one worker thread (i.e. no parallelism at all).


local[K] : Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).


local[K,F] : Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable)


local[*] : Run Spark locally with as many worker threads as logical cores on your machine.


local[*,F] : Run Spark locally with as many worker threads as logical cores on your machine and F maxFailures.


spark://HOST:PORT : Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default.


spark://HOST1:PORT1,HOST2:PORT2 : Connect to the given Spark standalone cluster with standby masters with Zookeeper. The list must have all the master hosts in the high availability cluster set up with Zookeeper. The port must be whichever each master is configured to use, which is 7077 by default.


mesos://HOST:PORT : Connect to the given Mesos cluster. The port must be whichever you have configured to use, which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, use mesos://zk://.... To submit with --deploy-mode cluster, the HOST:PORT should be configured to connect to the MesosClusterDispatcher.


yarn : Connect to a YARN cluster in client or cluster mode depending on the value of --deploy-mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.

https://spark.apache.org/docs/latest/submitting-applications.html

like image 35
FreeMan Avatar answered Oct 19 '22 16:10

FreeMan