I just downloaded Hortonworks sandbox VM, inside it there are Hadoop with the version 2.7.1. I adding some files by using the
hadoop fs -put /hw1/* /hw1
...command. After it I am deleting the added files, by the
hadoop fs -rm /hw1/*
...command, and after it cleaning the recycle bin, by the
hadoop fs -expunge
...command. But the DFS Remaining space not changed after recyle bin cleaned. Even I can see that the data was truly deleted from the /hw1/ and the recyle bin. I have the fs.trash.interval parameter = 1
.
Actually I can find all my data split in chunks in the /hadoop/hdfs/data/current/BP-2048114545-10.0.2.15-1445949559569/current/finalized/subdir0/subdir2
folder, and this is really surprises me, because I expect them to be deleted.
So my question how to delete the data the way that they really will be deleted? After few adding and deletion I got exhausted free space.
Actually any file stored in hdfs is split in blocks (chunks of data) and each block is replicated 3 times by default. When you delete a file you remove the metadata pointing to the blocks that is stored in Namenode. Blocks are deleted when there is no reference to them in the Namenode metadata.
Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .
Try hadoop fs -rm -R URI
-R option deletes the directory and any content under it recursively.
You can use
hdfs dfs -rm -R /path/to/HDFS/file
since hadoop dfs
has been deprecated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With