I'm running a python script in pyspark and got the following error: NameError: name 'spark' is not defined
I looked it up and found that the reason is that spark.dynamicAllocation.enabled
is not allowed yet.
According to Spark's documentation (https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-dynamic-allocation.html#spark_dynamicAllocation_enabled): spark.dynamicAllocation.enabled
(default: false
) controls whether dynamic allocation is enabled or not. It is assumed that spark.executor.instances
is not set or is 0 (which is the default value).
Since the default setting is false
, I need to change the Spark setting to enable spark.dynamicAllocation.enabled
.
I installed Spark with brew, and didn't change its configuration/setting.
How can I change the setting and enable spark.dynamicAllocation.enabled
?
Thanks a lot.
Question : How can I change the setting and enable spark.dynamicAllocation.enabled?
There are 3 options through which you can achive this.
1) modify the parameters mentioned below in the spark-defaults.conf
2) sending the below parameters from --conf from your spark-submit
3) Programatically specifying the config of dynamic allocation as demonstrated below.
out of which programatically you can do this way You can do it in programmatic way like this.
val conf = new SparkConf()
.setMaster("ClusterManager")
.setAppName("test-executor-allocation-manager")
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.dynamicAllocation.minExecutors", 1)
.set("spark.dynamicAllocation.maxExecutors", 2)
.set("spark.shuffle.service.enabled", "true") // for stand alone
There are several places you can set it. If you would like to enable it on a per job basis, set the following in each application:
conf.set("spark.dynamicAllocation.enabled","true")
If you want to set if for all jobs, navigate to the spark.conf file. In the Hortonworks distro it should be
/usr/hdp/current/spark-client/conf/
Add the setting to your spark-defaults.conf and should be good to go.
This is an issue that affects Spark installations made using other resources as well, such as the spark-ec2 script for installing on Amazon Web Services. From the Spark documentation, two values in SPARK_HOME/conf/spark-defaults.conf need to be set :
spark.shuffle.service.enabled true
spark.dynamicAllocation.enabled true
see this: https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
If your installation has a spark-env.sh script in SPARK_HOME/conf, make sure that it does not have lines such as the following, or that they are commented out:
export SPARK_WORKER_INSTANCES=1 #or some other integer, or
export SPARK_EXECUTOR_INSTANCES=1 #or some me other integer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With