I want to transfer files out from HDFS to local filesystem of a different server which is not in hadoop cluster but in the network.
I could have done:
hadoop fs -copyToLocal <src> <dest>
and then scp/ftp <toMyFileServer>.
As the data is huge and due to limited space on local filesystem of hadoop gateway machine, I wanted to avoid this and sent data directly to my file server.
Please help with some pointers on how to handle this issue.
Hadoop Get command is used to copy files from HDFS to the local file system, use Hadoop fs -get or hdfs dfs -get , on get command, specify the HDFS-file-path where you wanted to copy from and then local-file-path where you wanted a copy to the local file system. Copying files from HDFS file to local file system.
You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.
You can use the put command in the HDFS. This command is used to copy files from the HDFS file system to the local file system, just the opposite to put command.
This is the simplest way to do it:
ssh <YOUR_HADOOP_GATEWAY> "hdfs dfs -cat <src_in_HDFS> " > <local_dst>
It works for binary files too.
So you probably have a file with a bunch of parts as the output from your hadoop program.
part-r-00000
part-r-00001
part-r-00002
part-r-00003
part-r-00004
So lets do one part at a time?
for i in `seq 0 4`;
do
hadoop fs -copyToLocal output/part-r-0000$i ./
scp ./part-r-0000$i you@somewhere:/home/you/
rm ./part-r-0000$i
done
You may have to lookup the password modifier for scp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With