Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run spark interactively in cluster mode

I have a spark cluster running on

spark://host1:7077
spark://host2:7077
spark://host3:7077

and connect through /bin/spark-shell --master spark://host1:7077 When trying to read a file with:

val textFile = sc.textFile("README.md")
textFile.count()

The prompt says

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

When checked through Web ui on host1:8080 it shows:

Workers: 0
Cores: 0 Total, 0 Used
Memory: 0.0 B Total, 0.0 B Used
Applications: 0 Running, 2 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

My question is how to specify cores and memory when running in spark-shell cluster mode? Or I have to run by packaging my scala code into .jar file then submit the job to spark?

Thanks

like image 403
user2829759 Avatar asked Apr 22 '15 06:04

user2829759


People also ask

How do I run Spark application in cluster mode?

Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.

Can I run Spark Shell in cluster mode?

Based on the resource manager, the spark can run in two modes: Local Mode and cluster mode. The way we specify the resource manager is by the way of a command-line option called --master.

Can we run the Spark-submit in local mode in cluster?

No, the spark-submit parameters num-executors , executor-cores , executor-memory won't work in local mode because these parameters are to be used when you deploy your spark job on a cluster and not a single machine, these will only work in case you run your job in client or cluster mode.


1 Answers

Please package your code using jar and use this in your code

    String[] jars = new String[] { sparkJobJar };
    sparkConf.setMaster("masterip");

    sparkConf.set("spark.executor.memory", sparkWorkerMemory);

    sparkConf.set("spark.default.parallelism", sparkParallelism);
    JavaSparkContext ctx = new JavaSparkContext(sparkConf);

Using spark.executor.memory you can provide the worker memory and Parallelism will help with number of parallel tasks running on cluster.

you have slaves file in ../spark/conf you need to put ips from slaves here.

please start master on master node /spark/sbin/start-master.sh

please start slave on slave nodes /spark/sbin/start-slaves.sh

like image 136
Sandesh Deshmane Avatar answered Oct 03 '22 11:10

Sandesh Deshmane