I am running a spark cluster over C++ code wrapped in python. I am currently testing different configurations of multi-threading options (at Python level or Spark level). I am using spark with standalone binaries, over a HDFS 2.5.4 cluster. The cluster is currently made of 10 slaves, with 4 cores each. From what I can see, by default, Spark launches 4 slaves per node (I have 4 python working on a slave node at a time). How can I limit this number ? I can see that I have a --total-executor-cores option for "spark-submit", but there is little documentation on how it impacts the distribution of executors over the cluster ! I will run tests to get a clear idea, but if someone knowledgeable has a clue of what this option does, it could help. Update : I went through spark documentation again, here is what I understand : <ul> <li>By default, I have one executor per worker node (here 10 workers node, hence 10 executors) </li> <li>However, each worker can run several tasks in parallel. In standalone mode, the default behavior is to use all available cores, which explains why I can observe 4 python.</li> <li>To limit the number of cores used per worker, and limit the number of parallel tasks, I have at least 3 options : <ul> <li>use <code>--total-executor-cores</code> whith <code>spark-submit</code> (least satisfactory, since there is no clue on how the pool of cores is dealt with)</li> <li>use <code>SPARK_WORKER_CORES</code> in the configuration file</li> <li>use <code>-c</code> options with the starting scripts </li> </ul> </li> </ul> The following lines of this documentation http://spark.apache.org/docs/latest/spark-standalone.html helped me to figure out what is going on : <blockquote> SPARK_WORKER_INSTANCES Number of worker instances to run on each machine (default: 1). You can make this more than 1 if you have have very large machines and would like multiple Spark worker processes. If you do set this, make sure to also set SPARK_WORKER_CORES explicitly to limit the cores per worker, or else each worker will try to use all the cores. </blockquote> What is still unclear to me is why it is better in my case to limit the number of parallel tasks per worker node to 1 and rely on my C++ legacy code multithreading. I will update this post with experiment results, when I will finish my study.

The documentation does not seem clear. From my experience, the most common practice to allocate resources is by indicating the number of executors and the number of cores per executor, for example (taken from here): <pre class="prettyprint"><code>$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn-cluster \ --num-executors 10 \ --driver-memory 4g \ --executor-memory 2g \ --executor-cores 4 \ --queue thequeue \ lib/spark-examples*.jar \ 10 </code></pre> However, this approach is limited to YARN, and is not applicable to standalone and mesos based Spark, according to this. Instead, the parameter <code>--total-executor-cores</code> can be used, which represents the total amount of cores - of all executors - assigned to the Spark job. In your case, having a total of 40 cores, setting the attribute <code>--total-executor-cores 40</code> would make use of all the available resources. Unfortunately, I am not aware of how Spark distributes the workload when less resources than the total available are provided. If working with two or more simultaneous jobs, however, it should be transparent to the user, in that Spark (or whatever resource manager) would manage how the resources are managed depending on the user settings.

Using spark-submit, what is the behavior of the --total-executor-cores option?

Tags:

multithreading

apache-spark

cpu-cores

hadoop

pyspark

I am running a spark cluster over C++ code wrapped in python. I am currently testing different configurations of multi-threading options (at Python level or Spark level).

I am using spark with standalone binaries, over a HDFS 2.5.4 cluster. The cluster is currently made of 10 slaves, with 4 cores each.

From what I can see, by default, Spark launches 4 slaves per node (I have 4 python working on a slave node at a time).

How can I limit this number ? I can see that I have a --total-executor-cores option for "spark-submit", but there is little documentation on how it impacts the distribution of executors over the cluster !

I will run tests to get a clear idea, but if someone knowledgeable has a clue of what this option does, it could help.

Update :

I went through spark documentation again, here is what I understand :

By default, I have one executor per worker node (here 10 workers node, hence 10 executors)
However, each worker can run several tasks in parallel. In standalone mode, the default behavior is to use all available cores, which explains why I can observe 4 python.
To limit the number of cores used per worker, and limit the number of parallel tasks, I have at least 3 options :
- use --total-executor-cores whith spark-submit (least satisfactory, since there is no clue on how the pool of cores is dealt with)
- use SPARK_WORKER_CORES in the configuration file
- use -c options with the starting scripts

The following lines of this documentation http://spark.apache.org/docs/latest/spark-standalone.html helped me to figure out what is going on :

SPARK_WORKER_INSTANCES
Number of worker instances to run on each machine (default: 1). You can make this more than 1 if you have have very large machines and would like multiple Spark worker processes. If you do set this, make sure to also set SPARK_WORKER_CORES explicitly to limit the cores per worker, or else each worker will try to use all the cores.

What is still unclear to me is why it is better in my case to limit the number of parallel tasks per worker node to 1 and rely on my C++ legacy code multithreading. I will update this post with experiment results, when I will finish my study.

377

asked May 04 '15 13:05

MathiasOrtner

2 Answers

The documentation does not seem clear.

From my experience, the most common practice to allocate resources is by indicating the number of executors and the number of cores per executor, for example (taken from here):

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 10 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 4 \
--queue thequeue \
lib/spark-examples*.jar \
10

However, this approach is limited to YARN, and is not applicable to standalone and mesos based Spark, according to this.

Instead, the parameter --total-executor-cores can be used, which represents the total amount of cores - of all executors - assigned to the Spark job. In your case, having a total of 40 cores, setting the attribute --total-executor-cores 40 would make use of all the available resources.

Unfortunately, I am not aware of how Spark distributes the workload when less resources than the total available are provided. If working with two or more simultaneous jobs, however, it should be transparent to the user, in that Spark (or whatever resource manager) would manage how the resources are managed depending on the user settings.

191

answered Oct 10 '22 01:10

Mikel Urkia

To make sure how many workers started on each slave, open web browser, type http://master-ip:8080, and see the workers section about how many workers has been started exactly, and also which worker on which slave. (I mention these above because I am not sure what do you mean by saying '4 slaves per node')

By default, spark would start exact 1 worker on each slave unless you specify SPARK_WORKER_INSTANCES=n in conf/spark-env.sh, where n is the number of worker instance you would like to start on each slave.

When you submit a spark job through spark-submit, spark would start an application driver and several executors for your job.

If not specified clearly, spark would start one executor for each worker, i.e. the total executor num equal to the total worker num, and all cores would be available to this job.
--total-executor-cores you specified would limit the total cores that is available to this application.

answered Oct 10 '22 02:10

yjshen

Related questions
                            
                                What are Thread Groups in Java?
                            
                                Events and multithreading once again
                            
                                Log4J - SiftingAppender-like functionality
                            
                                How to speedup python unittest on muticore machines? [duplicate]
                            
                                how to kill a thread which is waiting for blocking function call in Java?
                            
                                What is "Sim blocking" (seen in tomcat doc)?
                            
                                Thread count increases a lot, even when deleting the threads
                            
                                ManualResetEventSlim: Calling .Set() followed immediately by .Reset() doesn't release *any* waiting threads
                            
                                Console.ReadKey vs Console.ReadLine with a Timer
                            
                                Core Data Multithreading Import (Duplicate Objects)
                            
                                Android check WIFI status (disconnected or user changed WIFI) How to FLAG it?
                            
                                Finding the minimum and maximum value within a Metal texture
                            
                                Is there any reason C++ 11+ std::mutex should be declared as a global variable instead of passed into a std::thread as a function parameter?
                            
                                Set Thread as background or not
                            
                                C linux pthread thread priority
                            
                                difference between Java atomic integer and C# Interlocked.Increment method
                            
                                ScheduledExecutorService multiple threads in parallel
                            
                                QTimer can only be used with threads started with QThread
                            
                                How to stop all the running threads, if one of those throws an Exception?
                            
                                Should the FUSE getattr operation always be serialised?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With