Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete files from the HDFS?

I just downloaded Hortonworks sandbox VM, inside it there are Hadoop with the version 2.7.1. I adding some files by using the

hadoop fs -put /hw1/* /hw1 

...command. After it I am deleting the added files, by the

hadoop fs -rm /hw1/* 

...command, and after it cleaning the recycle bin, by the

hadoop fs -expunge 

...command. But the DFS Remaining space not changed after recyle bin cleaned. Even I can see that the data was truly deleted from the /hw1/ and the recyle bin. I have the fs.trash.interval parameter = 1.

Actually I can find all my data split in chunks in the /hadoop/hdfs/data/current/BP-2048114545-10.0.2.15-1445949559569/current/finalized/subdir0/subdir2 folder, and this is really surprises me, because I expect them to be deleted.

So my question how to delete the data the way that they really will be deleted? After few adding and deletion I got exhausted free space.

like image 521
serg Avatar asked Dec 07 '15 18:12

serg


People also ask

What happened if we delete any file from HDFS?

Actually any file stored in hdfs is split in blocks (chunks of data) and each block is replicated 3 times by default. When you delete a file you remove the metadata pointing to the blocks that is stored in Namenode. Blocks are deleted when there is no reference to them in the Namenode metadata.

How do I view files in HDFS folder?

Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .


2 Answers

Try hadoop fs -rm -R URI

-R option deletes the directory and any content under it recursively.

like image 61
BruceWayne Avatar answered Sep 22 '22 15:09

BruceWayne


You can use

hdfs dfs -rm -R /path/to/HDFS/file 

since hadoop dfs has been deprecated.

like image 35
Giorgos Myrianthous Avatar answered Sep 22 '22 15:09

Giorgos Myrianthous