Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: java.io.IOException: No space left on device

Now I am learning how to use spark.I have a piece of code which can invert a matrix and it works when the order of the matrix is small like 100.But when the order of the matrix is big like 2000 I have an exception like this:

15/05/10 20:31:00 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/spark-local-20150510200122-effa/28/temp_shuffle_6ba230c3-afed-489b-87aa-91c046cadb22

java.io.IOException: No space left on device

In my program I have lots of lines like this:

val result1=matrix.map(...).reduce(...)
val result2=result1.map(...).reduce(...)
val result3=matrix.map(...)

(sorry about that because the code is to many to write there)

So I think when I do this Spark create some new rdds,and in my program Spark creates too many rdds so I have the exception.I am not sure if what I thought is correct.

How can I delete the rdds that I won't use any more?Like result1 and result2?

I have tried rdd.unpersist(), it doesn't work.

like image 472
赵祥宇 Avatar asked May 11 '15 08:05

赵祥宇


People also ask

What is the meaning of no space left on device?

No space left on device error often means you are over quota in the directory you're trying to create or move files to.

Why does a job fail with no space left on device but DF says otherwise?

/tmp is usually the operating system's (OS) temporary output directory, accessed by OS users, and /tmp is typically small and on a single disk. So when Spark runs lots of jobs, long jobs, or complex jobs, /tmp can fill up quickly, forcing Spark to throw “No space left on device” exceptions.


1 Answers

This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.

Set the following properties in spark-env.sh.
(change the directories accordingly to whatever directory in your infra, that has write permissions set and with enough space in it)

SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"

export SPARK_JAVA_OPTS

You can also set the spark.local.dir property in $SPARK_HOME/conf/spark-defaults.conf as stated by @EUgene below

like image 132
rahul gulati Avatar answered Oct 06 '22 23:10

rahul gulati