I have cluster of 4 datanodes and hdfs structure on each node is as below
I am facing disk space issue , as you can see the /tmp folder from hdfs has occupied more space(217GB). So i tried to investigate the data from /tmp folder. I found following temp files. I accessed these temp folders each contains some part files of 10gb to 20 gb in size. I want to clear this /tmp directory. can anyone please let me know the consequences of deleting these tmp folders or part files. Will it affect my cluster?
HDFS /tmp directory mainly used as a temporary storage during mapreduce operation. Mapreduce artifacts, intermediate data etc will be kept under this directory. These files will be automatically cleared out when mapreduce job execution completes.
The /tmp directory is a temporary landing place for files. Users also have write access to this directory, which can be a bad thing, but there is a solution.
dir is set by default to "${hadoop. tmp. dir}/mapred/system" , and this defines the Path on the HDFS where where the Map/Reduce framework stores system files.
By default files in /tmp/ are cleaned up after 10 days, and those in /var/tmp after 30 days.
HDFS /tmp directory mainly used as a temporary storage during mapreduce operation. Mapreduce artifacts, intermediate data etc will be kept under this directory. These files will be automatically cleared out when mapreduce job execution completes. If you delete this temporary files, it can affect the currently running mapreduce jobs.
Temporary files are created by pig. Temporary files deletion happens at the end. Pig does not handle temporary files deletion if the script execution failed or killed. Then you have to handle this situation. You better handle this temporary files clean up activity in the script itself.
Following article gives you a good understanding
http://www.lopakalogic.com/articles/hadoop-articles/pig-keeps-temp-files/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With