Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change Spark setting to allow spark.dynamicAllocation.enabled?

I'm running a python script in pyspark and got the following error: NameError: name 'spark' is not defined

I looked it up and found that the reason is that spark.dynamicAllocation.enabled is not allowed yet.

According to Spark's documentation (https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-dynamic-allocation.html#spark_dynamicAllocation_enabled): spark.dynamicAllocation.enabled (default: false) controls whether dynamic allocation is enabled or not. It is assumed that spark.executor.instances is not set or is 0 (which is the default value).

Since the default setting is false, I need to change the Spark setting to enable spark.dynamicAllocation.enabled.

I installed Spark with brew, and didn't change its configuration/setting.

How can I change the setting and enable spark.dynamicAllocation.enabled?

Thanks a lot.

like image 398
mflowww Avatar asked Oct 25 '16 01:10

mflowww


3 Answers

Question : How can I change the setting and enable spark.dynamicAllocation.enabled?

There are 3 options through which you can achive this.
1) modify the parameters mentioned below in the spark-defaults.conf
2) sending the below parameters from --conf from your spark-submit
3) Programatically specifying the config of dynamic allocation as demonstrated below.

out of which programatically you can do this way You can do it in programmatic way like this.

val conf = new SparkConf()
      .setMaster("ClusterManager")
      .setAppName("test-executor-allocation-manager")
      .set("spark.dynamicAllocation.enabled", "true")
      .set("spark.dynamicAllocation.minExecutors", 1)
      .set("spark.dynamicAllocation.maxExecutors", 2)
      .set("spark.shuffle.service.enabled", "true") // for stand alone
like image 169
Ram Ghadiyaram Avatar answered Oct 12 '22 19:10

Ram Ghadiyaram


There are several places you can set it. If you would like to enable it on a per job basis, set the following in each application:

conf.set("spark.dynamicAllocation.enabled","true")

If you want to set if for all jobs, navigate to the spark.conf file. In the Hortonworks distro it should be

/usr/hdp/current/spark-client/conf/

Add the setting to your spark-defaults.conf and should be good to go.

like image 33
Joe Widen Avatar answered Oct 12 '22 20:10

Joe Widen


This is an issue that affects Spark installations made using other resources as well, such as the spark-ec2 script for installing on Amazon Web Services. From the Spark documentation, two values in SPARK_HOME/conf/spark-defaults.conf need to be set :

spark.shuffle.service.enabled   true
spark.dynamicAllocation.enabled true

see this: https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation

If your installation has a spark-env.sh script in SPARK_HOME/conf, make sure that it does not have lines such as the following, or that they are commented out:

export SPARK_WORKER_INSTANCES=1 #or some other integer, or
export SPARK_EXECUTOR_INSTANCES=1 #or some me other integer
like image 1
Peter Pearman Avatar answered Oct 12 '22 20:10

Peter Pearman