Setting PySpark executor.memory and executor.core within Jupyter Notebook

Tags:

I am initializing PySpark from within a Jupyter Notebook as follows:

from pyspark import SparkContext
#
conf = SparkConf().setAppName("PySpark-testing-app").setMaster("yarn")
conf = (conf.set("deploy-mode","client")
       .set("spark.driver.memory","20g")
       .set("spark.executor.memory","20g")
       .set("spark.driver.cores","4")
       .set("spark.num.executors","6")
       .set("spark.executor.cores","4"))

sc = SparkContext(conf=conf)
sqlContext = SQLContext.getOrCreate(sc)

However, when I launch YARN GUI and look into "RUNNING Applications" I see my session being allocated with 1 container, 1 vCPU, and 1GB of RAM, i.e. the default values! Can I get the desired, passing values as listed above?

607

asked Jul 30 '18 20:07

TSAR

1 Answers

Jupyter notebook will launch the pyspark with yarn-client mode, the driver memory and some configs cannot be setted with class 'sparkConf'. you must set it in command line.

Take a look at official doc's explains at memory's setting:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.

there is another way that can make it.

import os
memory = '20g'
pyspark_submit_args = ' --driver-memory ' + memory + ' pyspark-shell'
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

So, other config should be taked with same way like above.

answered Sep 23 '22 17:09

Jack_H

Related questions
                            
                                How to stub file size on Active Storage test? (Test::Unit)
                            
                                Jest, how to compare two strings with different format?
                            
                                Filtering Image For Improving Text Recognition
                            
                                Querying a Partitioned table in BigQuery using a reference from a joined table
                            
                                What is in the sub and oid claims when getting client_credentials tokens from the Azure AD OAuth v2 token endpoint?
                            
                                Angular: Can I use translate with async pipe?
                            
                                Bitbucket Pipelines: gcloud crashed (UnicodeDecodeError)
                            
                                java8 stream of arrays to 2 dimensional array
                            
                                How to detect tables in images using tesseract 4.0 or using pytesseract? [closed]
                            
                                What is the best folder structure for ngrx when lazy loading is used Angular 6?
                            
                                Understanding behavior of Python imports and circular dependencies
                            
                                Detect broken image when using hydrate instead of render from react-dom

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With