Spark standalone configuration having multiple executors

Tags:

pyspark

I'm trying to setup a standalone Spark 2.0 server to process an analytics function in parallel. To do this I want to have a single worker with multiple executors.

I'm using :

Standalone Spark 2.0
8 Cores
24gig RAM
windows server 2008
pyspark (although this appears unrelated)

This is just for pure proof of concept purposes but I want to have 8 executors, one per each core.

I've tried to follow the other threads on this topic but for some reason it's not working for me. IE: Spark Standalone Number Executors/Cores Control

My configuration is as follows:

conf\spark-defaults.conf

spark.cores.max = 8
spark.executor.cores = 1

I have tried to also change my spark-env.sh file to no avail. Instead what is happening is that it shows that my 1 worker only has 1 executor on it. As you can see below, it still shows the standalone with 1 executor with 8 cores to it.

enter image description here

616

asked Oct 11 '16 20:10

WalkingDeadFan

2 Answers

I believe you mixed up local and standalone modes:

Local mode is a development tool where all processes are executed inside a single JVM. Application is started in a local mode by setting master to local, local[*] or local[n]. spark.executor.cores and spark.executor.cores are not applicable in the local mode because there is only one embedded executor.
Standalone mode requires a standalone Spark cluster. It requires a master node (can be started using SPARK_HOME/sbin/start-master.sh script) and at least one worker node (can be started using SPARK_HOME/sbin/start-slave.sh script).

SparkConf should use master node address to create (spark://host:port).

101

answered Oct 17 '22 16:10

zero323

You first need to configure your spark standalone cluster, then set the amount of resources needed for each individual spark application you want to run.

In order to configure the cluster, you can try this:

In conf/spark-env.sh:

Set the SPARK_WORKER_INSTANCES = 10

which determines the number of Worker instances (#Executors) per node (its default value is only 1)

Set the SPARK_WORKER_CORES = 15

number of cores that one Worker can use (default: all cores, your case is 36)

Set SPARK_WORKER_MEMORY = 55g

total amount of memory that can be used on one machine (Worker Node) for running Spark programs. Copy this configuration file to all Worker Nodes, on the same folder Start your cluster by running the scripts in sbin (sbin/start-all.sh, ...) As you have 5 workers, with the above configuration you should see 5 (workers) * 10 (executors per worker) = 50 alive executors on the master's web interface (http://localhost:8080 by default)

When you run an application in standalone mode, by default, it will acquire all available Executors in the cluster. You need to explicitly set the amount of resources for running this application: Eg:

val conf = new SparkConf() .setMaster(...) .setAppName(...) .set("spark.executor.memory", "2g") .set("spark.cores.max", "10")

answered Oct 17 '22 17:10

dilshad

Related questions
                            
                                add column from one dataframe to another dataframe in scala [duplicate]
                            
                                spark write to disk with N files less than N partitions
                            
                                Scala Spark - split vector column into separate columns in a Spark DataFrame
                            
                                Is there a way to submit spark job on different server running master
                            
                                Use Map to replace column values in Spark
                            
                                How to check if a Spark data frame struct Array contains a specific value
                            
                                Does pyspark changes order of instructions for optimization?
                            
                                IllegalArgumentException: Column must be of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was actually double.'
                            
                                How do I change the spark.ui.port?
                            
                                Apache Spark: how to transform Data Frame column with regex to another Data Frame?
                            
                                PySpark: Using Object in RDD
                            
                                How to convert type Row into Vector to feed to the KMeans
                            
                                Get the row corresponding to the latest timestamp in a Spark Dataset using Scala
                            
                                Spark in AWS: "S3AbortableInputStream: Not all bytes were read from the S3ObjectInputStream"
                            
                                Round double values and cast as integers
                            
                                How to check the number of partitions of a Spark DataFrame without incurring the cost of .rdd
                            
                                reading data from URL using spark databricks platform
                            
                                No implicits found for parameter evidence
                            
                                Spark: What is the difference between repartition and repartitionByRange?
                            
                                Spark: How to union a List<RDD> to RDD

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With