Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set Apache Spark Executor memory

Since you are running Spark in local mode, setting spark.executor.memory won't have any effect, as you have noticed. The reason for this is that the Worker "lives" within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by setting spark.driver.memory to something higher, for example 5g. You can do that by either:

  • setting it in the properties file (default is $SPARK_HOME/conf/spark-defaults.conf),

    spark.driver.memory              5g
    
  • or by supplying configuration setting at runtime

    $ ./bin/spark-shell --driver-memory 5g
    

Note that this cannot be achieved by setting it in the application, because it is already too late by then, the process has already started with some amount of memory.

The reason for 265.4 MB is that Spark dedicates spark.storage.memoryFraction * spark.storage.safetyFraction to the total amount of storage memory and by default they are 0.6 and 0.9.

512 MB * 0.6 * 0.9 ~ 265.4 MB

So be aware that not the whole amount of driver memory will be available for RDD storage.

But when you'll start running this on a cluster, the spark.executor.memory setting will take over when calculating the amount to dedicate to Spark's memory cache.


Also note, that for local mode you have to set the amount of driver memory before starting jvm:

bin/spark-submit --driver-memory 2g --class your.class.here app.jar

This will start the JVM with 2G instead of the default 512M.
Details here:

For local mode you only have one executor, and this executor is your driver, so you need to set the driver's memory instead. *That said, in local mode, by the time you run spark-submit, a JVM has already been launched with the default memory settings, so setting "spark.driver.memory" in your conf won't actually do anything for you. Instead, you need to run spark-submit as follows


The answer submitted by Grega helped me to solve my issue. I am running Spark locally from a python script inside a Docker container. Initially I was getting a Java out-of-memory error when processing some data in Spark. However, I was able to assign more memory by adding the following line to my script:

conf=SparkConf()
conf.set("spark.driver.memory", "4g") 

Here is a full example of the python script which I use to start Spark:

import os
import sys
import glob

spark_home = '<DIRECTORY WHERE SPARK FILES EXIST>/spark-2.0.0-bin-hadoop2.7/'
driver_home = '<DIRECTORY WHERE DRIVERS EXIST>'

if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = spark_home 

SPARK_HOME = os.environ['SPARK_HOME']

sys.path.insert(0,os.path.join(SPARK_HOME,"python"))
for lib in glob.glob(os.path.join(SPARK_HOME, "python", "lib", "*.zip")):
    sys.path.insert(0,lib);

from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext

conf=SparkConf()
conf.set("spark.executor.memory", "4g")
conf.set("spark.driver.memory", "4g")
conf.set("spark.cores.max", "2")
conf.set("spark.driver.extraClassPath",
    driver_home+'/jdbc/postgresql-9.4-1201-jdbc41.jar:'\
    +driver_home+'/jdbc/clickhouse-jdbc-0.1.52.jar:'\
    +driver_home+'/mongo/mongo-spark-connector_2.11-2.2.3.jar:'\
    +driver_home+'/mongo/mongo-java-driver-3.8.0.jar') 

sc = SparkContext.getOrCreate(conf)

spark = SQLContext(sc)

Apparently, the question never says to run on local mode not on yarn. Somehow I couldnt get spark-default.conf change to work. Instead I tried this and it worked for me

bin/spark-shell --master yarn --num-executors 6  --driver-memory 5g --executor-memory 7g

( couldnt bump executor-memory to 8g there is some restriction from yarn configuration.)


You need to increase the driver memory.On mac(i.e when running on local master), the default driver-memory is 1024M). By default, thus 380Mb is allotted to the executor.

Screenshot

Upon increasing [--driver-memory 2G], executor memory got increased to ~950Mb. enter image description here


As far as i know it wouldn't be possible to change the spark.executor.memory at run time. If you are running a stand-alone version, with pyspark and graphframes, you can launch the pyspark REPL by executing the following command:

pyspark --driver-memory 2g --executor-memory 6g --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11

Be sure to change the SPARK_VERSION environment variable appropriately regarding the latest released version of Spark


create a file called spark-env.sh in spark/conf directory and add this line

SPARK_EXECUTOR_MEMORY=2000m #memory size which you want to allocate for the executor