Spark 1.4 increase maxResultSize memory

Tags:

I am using Spark 1.4 for my research and struggling with the memory settings. My machine has 16GB of memory so no problem there since the size of my file is only 300MB. Although, when I try to convert Spark RDD to panda dataframe using toPandas() function I receive the following error:

serialized results of 9 tasks (1096.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

I tried to fix this changing the spark-config file and still getting the same error. I've heard that this is a problem with spark 1.4 and wondering if you know how to solve this. Any help is much appreciated.

466

asked Jun 25 '15 18:06

ahajib

1 Answers

You can set spark.driver.maxResultSize parameter in the SparkConf object:

from pyspark import SparkConf, SparkContext  # In Jupyter you have to stop the current context first sc.stop()  # Create new config conf = (SparkConf()     .set("spark.driver.maxResultSize", "2g"))  # Create new context sc = SparkContext(conf=conf)

You should probably create a new SQLContext as well:

from pyspark.sql import SQLContext sqlContext = SQLContext(sc)

190

answered Sep 24 '22 08:09

zero323

Related questions
                            
                                PyYAML dump format
                            
                                How to set the root directory for Visual Studio Code Python Extension?
                            
                                How is `x = 42; x = lambda: x` parsed?
                            
                                Simple file server to serve current directory [closed]
                            
                                How can I implement incremental training for xgboost?
                            
                                Dynamic/runtime method creation (code generation) in Python
                            
                                Make distutils look for numpy header files in the correct place
                            
                                Python: 'break' outside loop
                            
                                Converting a deque object into list
                            
                                In TensorFlow is there any way to just initialize uninitialised variables?
                            
                                How to flatten a pandas dataframe with some columns as json?
                            
                                Python modulo on floats
                            
                                Remove very last character in file
                            
                                Load S3 Data into AWS SageMaker Notebook
                            
                                Possible to use more than one argument on __getitem__?
                            
                                Specifying a type to be a List of numbers (ints and/or floats)?
                            
                                Pandas - add value at specific iloc into new dataframe column
                            
                                Cython compiled C extension: ImportError: dynamic module does not define init function
                            
                                python GDAL 2.1 installation on Ubuntu 16.04
                            
                                Python Pandas replace multiple columns zero to Nan

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark 1.4 increase maxResultSize memory

Tags:

python

memory

jupyter

apache-spark

pyspark

ahajib

People also ask

1 Answers

zero323

Recent Activity

Donate For Us