Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting PySpark executor.memory and executor.core within Jupyter Notebook

Tags:

I am initializing PySpark from within a Jupyter Notebook as follows:

from pyspark import SparkContext
#
conf = SparkConf().setAppName("PySpark-testing-app").setMaster("yarn")
conf = (conf.set("deploy-mode","client")
       .set("spark.driver.memory","20g")
       .set("spark.executor.memory","20g")
       .set("spark.driver.cores","4")
       .set("spark.num.executors","6")
       .set("spark.executor.cores","4"))

sc = SparkContext(conf=conf)
sqlContext = SQLContext.getOrCreate(sc)

However, when I launch YARN GUI and look into "RUNNING Applications" I see my session being allocated with 1 container, 1 vCPU, and 1GB of RAM, i.e. the default values! Can I get the desired, passing values as listed above?

like image 607
TSAR Avatar asked Jul 30 '18 20:07

TSAR


People also ask

What is the difference between executor and executor core in Spark?

The cores property controls the number of concurrent tasks an executor can run. - -executor-cores 5 means that each executor can run a maximum of five tasks at the same time.


1 Answers

Jupyter notebook will launch the pyspark with yarn-client mode, the driver memory and some configs cannot be setted with class 'sparkConf'. you must set it in command line.

Take a look at official doc's explains at memory's setting:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.

there is another way that can make it.

import os
memory = '20g'
pyspark_submit_args = ' --driver-memory ' + memory + ' pyspark-shell'
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

So, other config should be taked with same way like above.

like image 61
Jack_H Avatar answered Sep 23 '22 17:09

Jack_H