Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream data into hdfs directly without copying

I am looking for different options through which I can write data directly into hdfs using python without storing on the local node and then using copyfromlocal.

I would like to use hdfs file similar to local file and use write method with the line as the argument, something of the following:

   hdfs_file = hdfs.create("file_tmp")
   hdfs_file.write("Hello world\n")

Does there exist something similar to the use case described above?

like image 778
0xhacker Avatar asked Mar 16 '13 20:03

0xhacker


People also ask

Can Hadoop process streaming data?

Hadoop Streaming is another feature/utility of the Hadoop ecosystem that enables users to execute a MapReduce job from an executable script as Mapper and Reducers. Hadoop Streaming is often confused with real-time streaming, but it's simply a utility that runs an executable script in the MapReduce framework.

How do I transfer files to HDFS?

In order to copy a file from the local file system to HDFS, use Hadoop fs -put or hdfs dfs -put, on put command, specify the local-file-path where you wanted to copy from and then HDFS-file-path where you wanted to copy to. If the file already exists on HDFS, you will get an error message saying “File already exists”.

How copy data from Linux to HDFS?

Step 1: Make a directory in HDFS where you want to copy this file with the below command. Step 2: Use copyFromLocal command as shown below to copy it to HDFS /Hadoop_File directory. Step 3: Check whether the file is copied successfully or not by moving to its directory location with below command.


1 Answers

Im not sure about a python hdfs library, but you can always stream via a hadoop fs put command and denote copying from stdin using '-' as the source filename:

hadoop fs -put - /path/to/file/in/hdfs.txt
like image 161
Chris White Avatar answered Oct 11 '22 17:10

Chris White