I'm trying to build a recommender using Spark and just ran out of memory:
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space
I'd like to increase the memory available to Spark by modifying the spark.executor.memory
property, in PySpark, at runtime.
Is that possible? If so, how?
update
inspired by the link in @zero323's comment, I tried to delete and recreate the context in PySpark:
del sc
from pyspark import SparkConf, SparkContext
conf = (SparkConf().setMaster("http://hadoop01.woolford.io:7077").setAppName("recommender").set("spark.executor.memory", "2g"))
sc = SparkContext(conf = conf)
returned:
ValueError: Cannot run multiple SparkContexts at once;
That's weird, since:
>>> sc
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sc' is not defined
To enlarge the Spark shuffle service memory size, modify SPARK_DAEMON_MEMORY in $SPARK_HOME/conf/spark-env.sh, the default value is 2g, and then restart shuffle to make the change take effect.
You can tell the JVM to instantiate itself (JVM) with 9g of driver memory by using SparkConf . or in your default properties file. You can tell SPARK in your environment to read the default settings from SPARK_CONF_DIR or $SPARK_HOME/conf where the driver-memory can be configured. Spark is also fine with this.
You can resolve it by setting the partition size: increase the value of spark. sql. shuffle. partitions.
I'm not sure why you chose the answer above when it requires restarting your shell and opening with a different command! Though that works and is useful, there is an in-line solution which is what was actually being requested. This is essentially what @zero323 referenced in the comments above, but the link leads to a post describing implementation in Scala. Below is a working implementation specifically for PySpark.
Note: The SparkContext you want to modify the settings for must not have been started or else you will need to close it, modify settings, and re-open.
from pyspark import SparkContext
SparkContext.setSystemProperty('spark.executor.memory', '2g')
sc = SparkContext("local", "App Name")
source: https://spark.apache.org/docs/0.8.1/python-programming-guide.html
p.s. if you need to close the SparkContext just use:
SparkContext.stop(sc)
and to double check the current settings that have been set you can use:
sc._conf.getAll()
You could set spark.executor.memory
when you start your pyspark-shell
pyspark --num-executors 5 --driver-memory 2g --executor-memory 2g
Citing this, after 2.0.0 you don't have to use SparkContext
, but SparkSession
with conf
method as below:
spark.conf.set("spark.executor.memory", "2g")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With