I have a spark cluster running on
spark://host1:7077
spark://host2:7077
spark://host3:7077
and connect through /bin/spark-shell --master spark://host1:7077
When trying to read a file with:
val textFile = sc.textFile("README.md")
textFile.count()
The prompt says
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
When checked through Web ui on host1:8080
it shows:
Workers: 0
Cores: 0 Total, 0 Used
Memory: 0.0 B Total, 0.0 B Used
Applications: 0 Running, 2 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
My question is how to specify cores and memory when running in spark-shell cluster mode? Or I have to run by packaging my scala code into .jar
file then submit the job to spark?
Thanks
Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.
Based on the resource manager, the spark can run in two modes: Local Mode and cluster mode. The way we specify the resource manager is by the way of a command-line option called --master.
No, the spark-submit parameters num-executors , executor-cores , executor-memory won't work in local mode because these parameters are to be used when you deploy your spark job on a cluster and not a single machine, these will only work in case you run your job in client or cluster mode.
Please package your code using jar and use this in your code
String[] jars = new String[] { sparkJobJar };
sparkConf.setMaster("masterip");
sparkConf.set("spark.executor.memory", sparkWorkerMemory);
sparkConf.set("spark.default.parallelism", sparkParallelism);
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
Using spark.executor.memory you can provide the worker memory and Parallelism will help with number of parallel tasks running on cluster.
you have slaves file in ../spark/conf you need to put ips from slaves here.
please start master on master node /spark/sbin/start-master.sh
please start slave on slave nodes /spark/sbin/start-slaves.sh
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With