Spark: java.io.IOException: No space left on device

Tags:

Now I am learning how to use spark.I have a piece of code which can invert a matrix and it works when the order of the matrix is small like 100.But when the order of the matrix is big like 2000 I have an exception like this:

15/05/10 20:31:00 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/spark-local-20150510200122-effa/28/temp_shuffle_6ba230c3-afed-489b-87aa-91c046cadb22

java.io.IOException: No space left on device

In my program I have lots of lines like this:

val result1=matrix.map(...).reduce(...)
val result2=result1.map(...).reduce(...)
val result3=matrix.map(...)

(sorry about that because the code is to many to write there)

So I think when I do this Spark create some new rdds,and in my program Spark creates too many rdds so I have the exception.I am not sure if what I thought is correct.

How can I delete the rdds that I won't use any more?Like result1 and result2?

I have tried rdd.unpersist(), it doesn't work.

472

asked May 11 '15 08:05

赵祥宇

1 Answers

This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.

Set the following properties in spark-env.sh.
(change the directories accordingly to whatever directory in your infra, that has write permissions set and with enough space in it)

SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"

export SPARK_JAVA_OPTS

You can also set the spark.local.dir property in $SPARK_HOME/conf/spark-defaults.conf as stated by @EUgene below

132

answered Oct 06 '22 23:10

rahul gulati

Related questions
                            
                                Write and read raw byte arrays in Spark - using Sequence File SequenceFile
                            
                                How to check if Spark RDD is in memory?
                            
                                Can Spark code be run on cluster without spark-submit?
                            
                                How to save a spark RDD in gzip format through pyspark
                            
                                Parquet predicate pushdown
                            
                                How to map variable names to features after pipeline
                            
                                Find minimum for a timestamp through Spark groupBy dataframe
                            
                                Config file to define JSON Schema Structure in PySpark
                            
                                Spark Context is not automatically created in Scala Spark Shell
                            
                                Number of Executors in Spark Local Mode
                            
                                How to convert a string column with milliseconds to a timestamp with milliseconds in Spark 2.1 using Scala?
                            
                                Spark: converting GMT time stamps to Eastern taking daylight savings into account
                            
                                How many SparkSessions can a single application have?
                            
                                How to get a string representation of DataFrame (as does Dataset.show)?
                            
                                spark.sql.shuffle.partitions of 200 default partitions conundrum
                            
                                Ambiguous schema in Spark Scala
                            
                                Capturing the result of explain() in pyspark
                            
                                How to connect master and slaves in Apache-Spark? (Standalone Mode)
                            
                                How to access a web URL using a spark context
                            
                                HDFS file watcher

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark: java.io.IOException: No space left on device

Tags:

apache-spark

rdd

赵祥宇

People also ask

1 Answers

rahul gulati

Recent Activity

Donate For Us