Doing a quick test of the form
testfunc() {
hadoop fs -rm /test001.txt
hadoop fs -touchz /test001.txt
hadoop fs -setfattr -n trusted.testfield -v $(date +"%T") /test001.txt
hadoop fs -mv /test001.txt /tmp/.
hadoop fs -getfattr -d /tmp/test001.txt
}
testfunc()
testfunc()
resulting in output
... during second function call
mv: '/tmp/test001.txt': File exists
# file: /tmp/test001.txt
trusted.testfield="<old timestamp from first call>"
...
it seems like (unlike in linux) the hadoop fs mv
command does not overwrite a destination file if already exists. Is there a way to force overwrite behavior (I suppose I could check and delete the destination each time, but something like hadoop mv -overwrite <source> <dest>
would be more convenient for my purposes)?
** By the way if, I am interpreting the results incorrectly or the behavior just seems incorrect, let me know (as I had assumed that overwriting was the default behavior and am writing this question because I was surprised that it seemed not to be).
Copy files from the local file system to HDFS, similar to -put command. This command will not work if the file already exists. To overwrite the destination if the file already exists, add -f flag to command.
There is no cd (change directory) command in hdfs file system. You can only list the directories and use them for reaching the next directory. You have to navigate manually by providing the complete path using the ls command.
You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.
hdfs dfs {args} In summary, hadoop fs is versatile and can work with many file systems including HDFS. hdfs dfs works with HDFS and use it if your only need is to access HDFS.
I think there is no straight option to move and overwrite files from one HDFS location to other although copying (cp command) has the option to force (using -f). From Apache Hadoop documentation (https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html), it is said that Hadoop is designed to use write-once-read-many model which limited overwriting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With