Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is /tmp directory in hadoop hdfs?

I have cluster of 4 datanodes and hdfs structure on each node is as below

enter image description here

I am facing disk space issue , as you can see the /tmp folder from hdfs has occupied more space(217GB). So i tried to investigate the data from /tmp folder. I found following temp files. I accessed these temp folders each contains some part files of 10gb to 20 gb in size. I want to clear this /tmp directory. can anyone please let me know the consequences of deleting these tmp folders or part files. Will it affect my cluster?

enter image description here

like image 626
sandip divekar Avatar asked Jul 22 '14 11:07

sandip divekar


People also ask

What is tmp in HDFS?

HDFS /tmp directory mainly used as a temporary storage during mapreduce operation. Mapreduce artifacts, intermediate data etc will be kept under this directory. These files will be automatically cleared out when mapreduce job execution completes.

What is tmp directory used for?

The /tmp directory is a temporary landing place for files. Users also have write access to this directory, which can be a bad thing, but there is a solution.

Where is Hadoop tmp dir set?

dir is set by default to "${hadoop. tmp. dir}/mapred/system" , and this defines the Path on the HDFS where where the Map/Reduce framework stores system files.

How often is the tmp directory cleared?

By default files in /tmp/ are cleaned up after 10 days, and those in /var/tmp after 30 days.


1 Answers

HDFS /tmp directory mainly used as a temporary storage during mapreduce operation. Mapreduce artifacts, intermediate data etc will be kept under this directory. These files will be automatically cleared out when mapreduce job execution completes. If you delete this temporary files, it can affect the currently running mapreduce jobs.

Temporary files are created by pig. Temporary files deletion happens at the end. Pig does not handle temporary files deletion if the script execution failed or killed. Then you have to handle this situation. You better handle this temporary files clean up activity in the script itself.

Following article gives you a good understanding

http://www.lopakalogic.com/articles/hadoop-articles/pig-keeps-temp-files/

like image 71
SachinJ Avatar answered Oct 13 '22 22:10

SachinJ