Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Append data to existing file in HDFS Java

Tags:

I'm having trouble to append data to an existing file in HDFS. I want that if the file exists then append a line, if not, create a new file with the name given.

Here's my method to write into HDFS.

if (!file.exists(path)){    file.createNewFile(path); }  FSDataOutputStream fileOutputStream = file.append(path);  BufferedWriter br = new BufferedWriter(new OutputStreamWriter(fileOutputStream)); br.append("Content: " + content + "\n"); br.close(); 

Actually this method writes into HDFS and create a file but as I mention is not appending.

This is how I test my method:

RunTimeCalculationHdfsWrite.hdfsWriteFile("RunTimeParserLoaderMapperTest2", "Error message test 2.2", context, null); 

The first param is the name of the file, the second the message and the other two params are not important.

So anyone have an idea what I'm missing or doing wrong?

like image 590
kennechu Avatar asked Apr 10 '14 19:04

kennechu


People also ask

Can we modify files already present in HDFS?

You can not modified data once stored in hdfs because hdfs follows Write Once Read Many model. You can only append the data once stored in hdfs.

Does HDFS support append?

HDFS will append to the last block, not create a new block and copy the data from the old last block. This is not difficult because HDFS just uses a normal filesystem to write these block-files as normal files. Normal file systems have mechanisms for appending new data.


2 Answers

Actually, you can append to a HDFS file:

From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.

I checked HDFS sources, there is DistributedFileSystem#append method:

 FSDataOutputStream append(Path f, final int bufferSize, final Progressable progress) throws IOException 

For details, see presentation.

Also you can append through command line:

hdfs dfs -appendToFile <localsrc> ... <dst> 

Add lines directly from stdin:

echo "Line-to-add" | hdfs dfs -appendToFile - <dst> 
like image 84
Mikhail Golubtsov Avatar answered Oct 28 '22 23:10

Mikhail Golubtsov


Solved..!!

Append is supported in HDFS.

You just have to do some configurations and simple code as shown below :

Step 1: set dfs.support.append as true in hdfs-site.xml :

<property>    <name>dfs.support.append</name>    <value>true</value> </property> 

Stop all your daemon services using stop-all.sh and restart it again using start-all.sh

Step 2 (Optional): Only If you have a singlenode cluster , so you have to set replication factor to 1 as below :

Through command line :

./hdfs dfs -setrep -R 1 filepath/directory 

Or you can do the same at run time through java code:

fsShell.setrepr((short) 1, filePath);   

Step 3 : Code for Creating/appending data into the file :

public void createAppendHDFS() throws IOException {     Configuration hadoopConfig = new Configuration();     hadoopConfig.set("fs.defaultFS", hdfsuri);     FileSystem fileSystem = FileSystem.get(hadoopConfig);     String filePath = "/test/doc.txt";     Path hdfsPath = new Path(filePath);     fShell.setrepr((short) 1, filePath);      FSDataOutputStream fileOutputStream = null;     try {         if (fileSystem.exists(hdfsPath)) {             fileOutputStream = fileSystem.append(hdfsPath);             fileOutputStream.writeBytes("appending into file. \n");         } else {             fileOutputStream = fileSystem.create(hdfsPath);             fileOutputStream.writeBytes("creating and writing into file\n");         }     } finally {         if (fileSystem != null) {             fileSystem.close();         }         if (fileOutputStream != null) {             fileOutputStream.close();         }     } } 

Kindly let me know for any other help.

Cheers.!!

like image 32
Lovish chaudhary Avatar answered Oct 28 '22 22:10

Lovish chaudhary