Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a job fail with "No space left on device", but df says otherwise?

Tags:

apache-spark

When performing a shuffle my Spark job fails and says "no space left on device", but when I run df -h it says I have free space left! Why does this happen, and how can I fix it?

like image 958
samthebest Avatar asked Sep 07 '14 06:09

samthebest


People also ask

What is the meaning of no space left on device?

No space left on device error often means you are over quota in the directory you're trying to create or move files to.


2 Answers

By default Spark uses the /tmp directory to store intermediate data. If you actually do have space left on some device -- you can alter this by creating the file SPARK_HOME/conf/spark-defaults.conf and adding the line. Here SPARK_HOME is wherever you root directory for the spark install is.

spark.local.dir                     SOME/DIR/WHERE/YOU/HAVE/SPACE 
like image 68
quine Avatar answered Sep 21 '22 13:09

quine


You need to also monitor df -i which shows how many inodes are in use.

on each machine, we create M * R temporary files for shuffle, where M = number of map tasks, R = number of reduce tasks.

https://spark-project.atlassian.net/browse/SPARK-751

If you do indeed see that disks are running out of inodes to fix the problem you can:

  • Decrease partitions (see coalesce with shuffle = false).
  • One can drop the number to O(R) by “consolidating files”. As different file-systems behave differently it’s recommended that you read up on spark.shuffle.consolidateFiles and see https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.
  • Sometimes you may simply find that you need your DevOps to increase the number of inodes the FS supports.

EDIT

Consolidating files has been removed from spark since version 1.6. https://issues.apache.org/jira/browse/SPARK-9808

like image 42
samthebest Avatar answered Sep 21 '22 13:09

samthebest