Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change default stack size for spark driver running from jupyter?

I'm running python script on Spark cluster using jupyter. I want to change driver default stack size. I found in the documentation that I can use spark.driver.extraJavaOptions to send any options to driver JVM, but there is a note in the documentation:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.

The question is: How to change default driver parameter when running from jupyter ?

like image 945
Mahmoud Hanafy Avatar asked Sep 18 '25 02:09

Mahmoud Hanafy


1 Answers

You can customize the Java options used for the driver by passing spark.driver.extraJavaOptions as a configuration value into the SparkConf, eg:

from pyspark import SparkConf, SparkContext
conf = (SparkConf()
     .setMaster("spark://spark-master:7077")
     .setAppName("MyApp")
     .set("spark.driver.extraJavaOptions", "-Xss4M"))
sc = SparkContext.getOrCreate(conf = conf)

Note that in http://spark.apache.org/docs/latest/configuration.html it states about spark.driver.extraJavaOptions:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.

However this is talking about the JVM SparkConf class. When it’s set in the PySpark Python SparkConf, that passes it as a command-line parameter to spark-submit, which then uses it when instantiating the JVM, so that comment in the Spark docs does not apply.

like image 149
user1458424 Avatar answered Sep 20 '25 15:09

user1458424