I have been using PySpark with Ipython lately on my server with 24 CPUs and 32GB RAM. Its running only on one machine. In my process, I want to collect huge amount of data as is give in below code:
train_dataRDD = (train.map(lambda x:getTagsAndText(x)) .filter(lambda x:x[-1]!=[]) .flatMap(lambda (x,text,tags): [(tag,(x,text)) for tag in tags]) .groupByKey() .mapValues(list))
When I do
training_data = train_dataRDD.collectAsMap()
It gives me outOfMemory Error. Java heap Space
. Also, I can not perform any operations on Spark after this error as it looses connection with Java. It gives Py4JNetworkError: Cannot connect to the java server
.
It looks like heap space is small. How can I set it to bigger limits?
EDIT:
Things that I tried before running: sc._conf.set('spark.executor.memory','32g').set('spark.driver.memory','32g').set('spark.driver.maxResultsSize','0')
I changed the spark options as per the documentation here(if you do ctrl-f and search for spark.executor.extraJavaOptions) : http://spark.apache.org/docs/1.2.1/configuration.html
It says that I can avoid OOMs by setting spark.executor.memory option. I did the same thing but it seem not be working.
OutOfMemoryError: Java heap space. 1) An easy way to solve OutOfMemoryError in java is to increase the maximum heap size by using JVM options "-Xmx512M", this will immediately solve your OutOfMemoryError.
You can resolve it by setting the partition size: increase the value of spark. sql. shuffle. partitions.
lang. OutOfMemoryError exception. Usually, this error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.
After trying out loads of configuration parameters, I found that there is only one need to be changed to enable more Heap space and i.e. spark.driver.memory
.
sudo vim $SPARK_HOME/conf/spark-defaults.conf #uncomment the spark.driver.memory and change it according to your use. I changed it to below spark.driver.memory 15g # press : and then wq! to exit vim editor
Close your existing spark application and re run it. You will not encounter this error again. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With