Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark: java.lang.OutofMemoryError: Java heap space

I have been using PySpark with Ipython lately on my server with 24 CPUs and 32GB RAM. Its running only on one machine. In my process, I want to collect huge amount of data as is give in below code:

train_dataRDD = (train.map(lambda x:getTagsAndText(x)) .filter(lambda x:x[-1]!=[]) .flatMap(lambda (x,text,tags): [(tag,(x,text)) for tag in tags]) .groupByKey() .mapValues(list)) 

When I do

training_data =  train_dataRDD.collectAsMap() 

It gives me outOfMemory Error. Java heap Space. Also, I can not perform any operations on Spark after this error as it looses connection with Java. It gives Py4JNetworkError: Cannot connect to the java server.

It looks like heap space is small. How can I set it to bigger limits?

EDIT:

Things that I tried before running: sc._conf.set('spark.executor.memory','32g').set('spark.driver.memory','32g').set('spark.driver.maxResultsSize','0')

I changed the spark options as per the documentation here(if you do ctrl-f and search for spark.executor.extraJavaOptions) : http://spark.apache.org/docs/1.2.1/configuration.html

It says that I can avoid OOMs by setting spark.executor.memory option. I did the same thing but it seem not be working.

like image 731
pg2455 Avatar asked Sep 01 '15 16:09

pg2455


People also ask

How do I fix Java Lang OutOfMemoryError Java heap space?

OutOfMemoryError: Java heap space. 1) An easy way to solve OutOfMemoryError in java is to increase the maximum heap size by using JVM options "-Xmx512M", this will immediately solve your OutOfMemoryError.

How do you fix out of memory error in Pyspark?

You can resolve it by setting the partition size: increase the value of spark. sql. shuffle. partitions.

What causes Java Lang OutOfMemoryError Java heap space?

lang. OutOfMemoryError exception. Usually, this error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.


1 Answers

After trying out loads of configuration parameters, I found that there is only one need to be changed to enable more Heap space and i.e. spark.driver.memory.

sudo vim $SPARK_HOME/conf/spark-defaults.conf #uncomment the spark.driver.memory and change it according to your use. I changed it to below spark.driver.memory 15g # press : and then wq! to exit vim editor 

Close your existing spark application and re run it. You will not encounter this error again. :)

like image 68
pg2455 Avatar answered Sep 17 '22 22:09

pg2455