Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overwrite destination with hadoop fs mv?

Tags:

Doing a quick test of the form

testfunc() {
    hadoop fs -rm /test001.txt
    hadoop fs -touchz /test001.txt
    hadoop fs -setfattr -n trusted.testfield -v $(date +"%T") /test001.txt
    hadoop fs -mv /test001.txt /tmp/.
    hadoop fs -getfattr -d /tmp/test001.txt
}
testfunc()
testfunc()

resulting in output

... during second function call
mv: '/tmp/test001.txt': File exists
# file: /tmp/test001.txt
trusted.testfield="<old timestamp from first call>"
...

it seems like (unlike in linux) the hadoop fs mv command does not overwrite a destination file if already exists. Is there a way to force overwrite behavior (I suppose I could check and delete the destination each time, but something like hadoop mv -overwrite <source> <dest> would be more convenient for my purposes)?

** By the way if, I am interpreting the results incorrectly or the behavior just seems incorrect, let me know (as I had assumed that overwriting was the default behavior and am writing this question because I was surprised that it seemed not to be).

like image 815
lampShadesDrifter Avatar asked May 22 '18 00:05

lampShadesDrifter


People also ask

How do I overwrite a file in Hadoop?

Copy files from the local file system to HDFS, similar to -put command. This command will not work if the file already exists. To overwrite the destination if the file already exists, add -f flag to command.

How do I change directories in Hadoop FS?

There is no cd (change directory) command in hdfs file system. You can only list the directories and use them for reaching the next directory. You have to navigate manually by providing the complete path using the ls command.

How do I move a file from one directory to another in Hadoop?

You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.

What is the difference between Hadoop FS and HDFS DFS?

hdfs dfs {args} In summary, hadoop fs is versatile and can work with many file systems including HDFS. hdfs dfs works with HDFS and use it if your only need is to access HDFS.


1 Answers

I think there is no straight option to move and overwrite files from one HDFS location to other although copying (cp command) has the option to force (using -f). From Apache Hadoop documentation (https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html), it is said that Hadoop is designed to use write-once-read-many model which limited overwriting.

like image 66
Agung Sriwongo Avatar answered Oct 05 '22 13:10

Agung Sriwongo