How to change Spark setting to allow spark.dynamicAllocation.enabled?

Question

I'm running a python script in pyspark and got the following error: NameError: name 'spark' is not defined

I looked it up and found that the reason is that spark.dynamicAllocation.enabled is not allowed yet.

According to Spark's documentation (https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-dynamic-allocation.html#spark_dynamicAllocation_enabled): spark.dynamicAllocation.enabled (default: false) controls whether dynamic allocation is enabled or not. It is assumed that spark.executor.instances is not set or is 0 (which is the default value).

Since the default setting is false, I need to change the Spark setting to enable spark.dynamicAllocation.enabled.

I installed Spark with brew, and didn't change its configuration/setting.

How can I change the setting and enable spark.dynamicAllocation.enabled?

Thanks a lot.

Ram Ghadiyaram · Accepted Answer

Question : How can I change the setting and enable spark.dynamicAllocation.enabled?

There are 3 options through which you can achive this.
1) modify the parameters mentioned below in the spark-defaults.conf
2) sending the below parameters from --conf from your spark-submit
3) Programatically specifying the config of dynamic allocation as demonstrated below.

out of which programatically you can do this way You can do it in programmatic way like this.

val conf = new SparkConf()
      .setMaster("ClusterManager")
      .setAppName("test-executor-allocation-manager")
      .set("spark.dynamicAllocation.enabled", "true")
      .set("spark.dynamicAllocation.minExecutors", 1)
      .set("spark.dynamicAllocation.maxExecutors", 2)
      .set("spark.shuffle.service.enabled", "true") // for stand alone

Joe Widen · Answer

There are several places you can set it. If you would like to enable it on a per job basis, set the following in each application:

conf.set("spark.dynamicAllocation.enabled","true")

If you want to set if for all jobs, navigate to the spark.conf file. In the Hortonworks distro it should be

/usr/hdp/current/spark-client/conf/

Add the setting to your spark-defaults.conf and should be good to go.

Peter Pearman · Answer

This is an issue that affects Spark installations made using other resources as well, such as the spark-ec2 script for installing on Amazon Web Services. From the Spark documentation, two values in SPARK_HOME/conf/spark-defaults.conf need to be set :

spark.shuffle.service.enabled   true
spark.dynamicAllocation.enabled true

see this: https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation

If your installation has a spark-env.sh script in SPARK_HOME/conf, make sure that it does not have lines such as the following, or that they are commented out:

export SPARK_WORKER_INSTANCES=1 #or some other integer, or
export SPARK_EXECUTOR_INSTANCES=1 #or some me other integer

How to change Spark setting to allow spark.dynamicAllocation.enabled?

Tags:

python

configuration

dynamic-allocation

apache-spark

pyspark

mflowww

3 Answers

Ram Ghadiyaram

Joe Widen

Peter Pearman

Recent Activity

Donate For Us

How to change Spark setting to allow spark.dynamicAllocation.enabled?

Tags:

python

configuration

dynamic-allocation

apache-spark

pyspark

mflowww

3 Answers

Ram Ghadiyaram

Joe Widen

Peter Pearman

Related questions

Recent Activity

Donate For Us