If I run a spark program in spark shell, is it possible that the program can hog the entire hadoop cluster for hours? usually there is a setting called num-executors and executor-cores. <pre class="prettyprint"><code>spark-shell --driver-memory 10G --executor-memory 15G --executor-cores 8 </code></pre> but if they are not specified and I just run "spark-shell"... will it consume the entire cluster? or are there reasonable defaults.

The default values for most configuration properties can be found in the Spark Configuration documentation. For the configuration properties on your example, the defaults are: <blockquote> <ul> <li>spark.driver.memory = 1g</li> <li>spark.executor.memory = 1g</li> <li>spark.executor.cores = 1 in YARN mode, all the available cores on the worker in standalone mode.</li> </ul> </blockquote> Additionally, you can override these defaults by creating the file<code>$SPARK-HOME/conf/spark-defaults.conf</code> with the properties you want (as described here). Then, if the file exists with the desired values, you don't need to pass them as arguments to the <code>spark-shell</code> command.

Default number of executors and cores for spark-shell

Tags:

apache-spark

If I run a spark program in spark shell, is it possible that the program can hog the entire hadoop cluster for hours?

usually there is a setting called num-executors and executor-cores.

spark-shell --driver-memory 10G --executor-memory 15G --executor-cores 8

but if they are not specified and I just run "spark-shell"... will it consume the entire cluster? or are there reasonable defaults.

923

asked May 10 '16 00:05

Knows Not Much

1 Answers

The default values for most configuration properties can be found in the Spark Configuration documentation. For the configuration properties on your example, the defaults are:

spark.driver.memory = 1g

spark.executor.memory = 1g

spark.executor.cores = 1 in YARN mode, all the available cores on the worker in standalone mode.

Additionally, you can override these defaults by creating the file$SPARK-HOME/conf/spark-defaults.conf with the properties you want (as described here). Then, if the file exists with the desired values, you don't need to pass them as arguments to the spark-shell command.

answered Oct 14 '22 02:10

Daniel de Paula

Related questions
                            
                                Apache Spark: Job aborted due to stage failure: "TID x failed for unknown reasons"
                            
                                How to convert spark SchemaRDD into RDD of my case class?
                            
                                "No Filesystem for Scheme: gs" when running spark job locally
                            
                                Running Spark jobs on a YARN cluster with additional files
                            
                                Append a new column to an existing parquet file
                            
                                Spark reading python3 pickle as input
                            
                                Why do columns change to nullable in Apache Spark SQL?
                            
                                Save and load two ML models in pyspark
                            
                                Spark Structured streaming: multiple sinks
                            
                                Spark, Alternative to Fat Jar
                            
                                Extract words from a string column in spark dataframe
                            
                                SQL over Spark Streaming
                            
                                Get current task ID in Spark in Java
                            
                                Can I use Spark without Hadoop for development environment?
                            
                                spark.ml StringIndexer throws 'Unseen label' on fit()
                            
                                Scala - why Double consume less memory than Floats in this case?
                            
                                Filtering rows based on column values in spark dataframe scala
                            
                                How to add a column to Dataset without converting from a DataFrame and accessing it?
                            
                                AWS Glue write parquet with partitions
                            
                                pyspark partitioning data using partitionby

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With