Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

putting a remote file into hadoop without copying it to local disk

I am writing a shell script to put data into hadoop as soon as they are generated. I can ssh to my master node, copy the files to a folder over there and then put them into hadoop. I am looking for a shell command to get rid of copying the file to the local disk on master node. to better explain what I need, here below you can find what I have so far:

1) copy the file to the master node's local disk:

scp test.txt username@masternode:/folderName/ 

I have already setup SSH connection using keys. So no password is needed to do this.

2) I can use ssh to remotely execute the hadoop put command:

ssh username@masternode "hadoop dfs -put /folderName/test.txt hadoopFolderName/" 

what I am looking for is how to pipe/combine these two steps into one and skip the local copy of the file on masterNode's local disk.

thanks

In other words, I want to pipe several command in a way that I can

like image 850
reza Avatar asked Jun 30 '12 00:06

reza


People also ask

How do I transfer files from desktop to Hadoop?

Hadoop copyFromLocal command is used to copy the file from your local file system to the HDFS(Hadoop Distributed File System). copyFromLocal command has an optional switch –f which is used to replace the already existing file in the system, means it can be used to update that file.

How do I move a file from one directory to another in Hadoop?

You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.

What happens when you copy file into HDFS?

Note that you can use it with either hadoop fs -put or hdfs dfs -put to upload files from the local file system to HDFS, both return the same results. Copying files from a local file to HDFS file system, Similar to the fs -put command and copyFromLocal command both are Store files from the local file system to HDFS.


1 Answers

Try this (untested):

cat test.txt | ssh username@masternode "hadoop dfs -put - hadoopFoldername/test.txt" 

I've used similar tricks to copy directories around:

tar cf - . | ssh remote "(cd /destination && tar xvf -)" 

This sends the output of local-tar into the input of remote-tar.

like image 161
sarnold Avatar answered Oct 02 '22 19:10

sarnold