Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete files older than 10days on HDFS

Tags:

Is there a way to delete files older than 10 days on HDFS?

In Linux I would use:

find /path/to/directory/ -type f -mtime +10 -name '*.txt' -execdir rm -- {} \;

Is there a way to do this on HDFS? (Deletion to be done based on file creation date)

like image 742
Ani Menon Avatar asked May 29 '17 05:05

Ani Menon


People also ask

How do I delete multiple files in HDFS?

You will find rm command in your Hadoop fs command. This command is similar to the Linux rm command, and it is used for removing a file from the HDFS file system. The command –rmr can be used to delete files recursively.

What is expunge in HDFS?

expunge: This command is used to empty the trash available in an HDFS system. Syntax: $ hadoop fs –expunge.

Which command is used for removing directory from HDFS?

Log into the Hadoop NameNode using the database administrator's account and use HDFS's rmr command to delete the directories.


2 Answers

How about this:

hdfs dfs -ls /tmp    |   tr -s " "    |    cut -d' ' -f6-8    |     grep "^[0-9]"    |    awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'

A detailed description is here.

like image 22
PradeepKumbhar Avatar answered Oct 18 '22 11:10

PradeepKumbhar


Solution 1: Using multiple commands as answered by daemon12

hdfs dfs -ls /file/Path    |   tr -s " "    |    cut -d' ' -f6-8    |     grep "^[0-9]"    |    awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'

Solution 2: Using Shell script

today=`date +'%s'`
hdfs dfs -ls /file/Path/ | grep "^d" | while read line ; do
dir_date=$(echo ${line} | awk '{print $6}')
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))
filePath=$(echo ${line} | awk '{print $8}')

if [ ${difference} -gt 10 ]; then
    hdfs dfs -rm -r $filePath
fi
done
like image 183
Ani Menon Avatar answered Oct 18 '22 11:10

Ani Menon