I'm trying to build a recommender using Spark and just ran out of memory: <pre class="prettyprint"><code>Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space </code></pre> I'd like to increase the memory available to Spark by modifying the <code>spark.executor.memory</code> property, in PySpark, at runtime. Is that possible? If so, how? update inspired by the link in @zero323's comment, I tried to delete and recreate the context in PySpark: <pre class="prettyprint"><code>del sc from pyspark import SparkConf, SparkContext conf = (SparkConf().setMaster("http://hadoop01.woolford.io:7077").setAppName("recommender").set("spark.executor.memory", "2g")) sc = SparkContext(conf = conf) </code></pre> returned: <pre class="prettyprint"><code>ValueError: Cannot run multiple SparkContexts at once; </code></pre> That's weird, since: <pre class="prettyprint"><code>>>> sc Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'sc' is not defined </code></pre>

You could set <code>spark.executor.memory</code> when you start your <code>pyspark-shell</code> <pre class="prettyprint"><code>pyspark --num-executors 5 --driver-memory 2g --executor-memory 2g </code></pre>

Citing this, after 2.0.0 you don't have to use <code>SparkContext</code>, but <code>SparkSession</code> with <code>conf</code> method as below: <pre class="prettyprint"><code>spark.conf.set("spark.executor.memory", "2g") </code></pre>

Increase memory available to PySpark at runtime

Tags:

apache-spark

pyspark

I'm trying to build a recommender using Spark and just ran out of memory:

Click to copy

Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space

I'd like to increase the memory available to Spark by modifying the spark.executor.memory property, in PySpark, at runtime.

Is that possible? If so, how?

update

inspired by the link in @zero323's comment, I tried to delete and recreate the context in PySpark:

Click to copy

del sc
from pyspark import SparkConf, SparkContext
conf = (SparkConf().setMaster("http://hadoop01.woolford.io:7077").setAppName("recommender").set("spark.executor.memory", "2g"))
sc = SparkContext(conf = conf)

returned:

Click to copy

ValueError: Cannot run multiple SparkContexts at once;

That's weird, since:

Click to copy

>>> sc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sc' is not defined

643

asked Jul 16 '15 21:07

Alex Woolford

Video Answer

3 Answers

I'm not sure why you chose the answer above when it requires restarting your shell and opening with a different command! Though that works and is useful, there is an in-line solution which is what was actually being requested. This is essentially what @zero323 referenced in the comments above, but the link leads to a post describing implementation in Scala. Below is a working implementation specifically for PySpark.

Note: The SparkContext you want to modify the settings for must not have been started or else you will need to close it, modify settings, and re-open.

Click to copy

from pyspark import SparkContext
SparkContext.setSystemProperty('spark.executor.memory', '2g')
sc = SparkContext("local", "App Name")

source: https://spark.apache.org/docs/0.8.1/python-programming-guide.html

p.s. if you need to close the SparkContext just use:

Click to copy

SparkContext.stop(sc)

and to double check the current settings that have been set you can use:

Click to copy

sc._conf.getAll()

140

answered Oct 17 '22 20:10

abby sobh

You could set spark.executor.memory when you start your pyspark-shell

Click to copy

pyspark --num-executors 5 --driver-memory 2g --executor-memory 2g

answered Oct 17 '22 19:10

Minh Ha Pham

Citing this, after 2.0.0 you don't have to use SparkContext, but SparkSession with conf method as below:

Click to copy

spark.conf.set("spark.executor.memory", "2g")

answered Oct 17 '22 20:10

Gomes

Related questions
                            
                                Pandas dataframe to Spark dataframe "Can not merge type error"
                            
                                How to specify the version of Python for spark-submit to use?
                            
                                How to know what is the reason for ClosedChannelExceptions with spark-shell in YARN client mode?
                            
                                How do I add an persistent column of row ids to Spark DataFrame?
                            
                                Pyspark: repartition vs partitionBy
                            
                                How to log using log4j to local file system inside a Spark application that runs on YARN?
                            
                                Perform a typed join in Scala with Spark Datasets
                            
                                Require kryo serialization in Spark (Scala)
                            
                                datetime range filter in PySpark SQL
                            
                                DataFrame / Dataset groupBy behaviour/optimization
                            
                                How to change memory per node for apache spark worker
                            
                                Change Executor Memory (and other configs) for Spark Shell
                            
                                How to convert List to JavaRDD
                            
                                Dealing with unbalanced datasets in Spark MLlib
                            
                                Spark DataFrame - Select n random rows
                            
                                How to create SparkSession from existing SparkContext
                            
                                How to sort an RDD in Scala Spark?
                            
                                map vs mapValues in Spark
                            
                                How do I use multiple conditions with pyspark.sql.functions.when()?
                            
                                Replace empty strings with None/null values in DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Increase memory available to PySpark at runtime

Tags:

apache-spark

pyspark

Alex Woolford

People also ask

Video Answer

3 Answers

abby sobh

Minh Ha Pham

Gomes

Recent Activity

Donate For Us