I am learning Spark on AWS EMR. In the process I am trying to understand the difference between number of executors(--num-executors) and executor cores (--executor-cores). Can any one please tell me here? Also when I am trying to submit the following job, I am getting error: <pre class="prettyprint"><code>spark-submit --deploy-mode cluster --master yarn --num-executors 1 --executor-cores 5 --executor-memory 1g -–conf spark.yarn.submit.waitAppCompletion=false wordcount.py s3://test/spark-example/input/input.txt s3://test/spark-example/output21 Error: Unrecognized option: -–conf </code></pre>

Number of executors is the number of distinct yarn containers (think processes/JVMs) that will execute your application. Number of executor-cores is the number of threads you get inside each executor (container). So the parallelism (number of concurrent threads/tasks running) of your spark application is <code>#executors X #executor-cores</code>. If you have 10 executors and 5 executor-cores you will have (hopefully) 50 tasks running at the same time.

YARN: What is the difference between number-of-executors and executor-cores in Spark?

Tags:

apache-spark

hadoop-yarn

emr

I am learning Spark on AWS EMR. In the process I am trying to understand the difference between number of executors(--num-executors) and executor cores (--executor-cores). Can any one please tell me here?

Also when I am trying to submit the following job, I am getting error:

spark-submit --deploy-mode cluster --master yarn --num-executors 1 --executor-cores 5   --executor-memory 1g -–conf spark.yarn.submit.waitAppCompletion=false wordcount.py s3://test/spark-example/input/input.txt s3://test/spark-example/output21

Error: Unrecognized option: -–conf

487

asked Apr 25 '16 23:04

AIR

1 Answers

Number of executors is the number of distinct yarn containers (think processes/JVMs) that will execute your application.

Number of executor-cores is the number of threads you get inside each executor (container).

So the parallelism (number of concurrent threads/tasks running) of your spark application is #executors X #executor-cores. If you have 10 executors and 5 executor-cores you will have (hopefully) 50 tasks running at the same time.

answered Oct 31 '22 03:10

marios

Related questions
                            
                                How can I load Avros in Spark using the schema on-board the Avro file(s)?
                            
                                What happens if the driver program crashes?
                            
                                sbt - exclude certain dependency only during publish
                            
                                Implementing custom Spark RDD in Java
                            
                                Spark MLLib Kmeans from dataframe, and back again
                            
                                Spark __getnewargs__ error
                            
                                Spark: driver/worker configuration. Does driver run on Master node?
                            
                                More than one hour to execute pyspark.sql.DataFrame.take(4)
                            
                                spark.driver.extraClassPath Multiple Jars
                            
                                Spark DataFrame equivalent to Pandas Dataframe `.iloc()` method?
                            
                                How to use from_json with schema as string (i.e. a JSON-encoded schema)?
                            
                                Spark: count percentage percentages of a column values
                            
                                TypeError: 'Column' object is not callable using WithColumn
                            
                                The purpose of ClosureCleaner.clean
                            
                                How to get WebUI URI from SparkContext
                            
                                how to deal with error SPARK-5063 in spark
                            
                                'Connection Refused' error while running Spark Streaming on local machine
                            
                                Spark write Parquet to S3 the last task takes forever
                            
                                What is the difference between Spark DataSet and RDD
                            
                                In Spark is counting the records in an RDD expensive task?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With