Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update a file in HDFS

I know that HDFS is write once and read many times.
Suppose if i want to update a file in HDFS is there any way to do it ?

Thankyou in advance !

like image 861
Raj Avatar asked Aug 24 '16 17:08

Raj


People also ask

How do I overwrite a file in Hadoop?

We can write the command with –f option to overwrite the file if it is already present.

Which command is used to upload a file in HDFS?

You can copy (upload) a file from the local filesystem to a specific HDFS using the fs put command. The specified file or directory is copied from your local filesystem to the HDFS. You can copy (download) a file from the a specific HDFS to your local filesystem using the fs get command.

How do I rename an HDFS file?

Renaming is the way to move files on HDFS: FileSystem. rename(). Actually, this is exactly what the HDFS shell command "-mv" does as well, you can check it in the source code.

How to modify a file in HDFS?

Get the original file from HDFS to the local filesystem, modify it and then put it back on HDFS. Show activity on this post. If you want to add lines, you must put another file and concatenate files: To modify any portion of a file that is already written you have three options: hdfs dfs -cat /hdfs/source/path | modify...

How to copy files/folders from local system to HDFS?

copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is the most important command. Local filesystem means the files present on the OS. Example: Let’s suppose we have a file AI.txt on Desktop which we want to copy to folder geeks present on hdfs.

What is HDFS command in Hadoop?

HDFS is the primary or major component of the Hadoop ecosystem which is responsible for storing large data sets of structured or unstructured data across various nodes and thereby maintaining the metadata in the form of log files. To use the HDFS commands, first you need to start the Hadoop services using the following command:

How do I operate on an instance of HDFS?

Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as ‘ls’, ‘cd’, ‘cp’, ‘mkdir’, ‘find’, ‘grep’ Thanks for contributing an answer to Stack Overflow!


2 Answers

Option1:

If you just want to append to an existing file

  1. echo "<Text to append>" | hdfs dfs -appendToFile - /user/hduser/myfile.txt OR

  2. hdfs dfs -appendToFile - /user/hduser/myfile.txt and then type the text on the terminal. Once you are done typing then hit 'Ctrl+D'

Option2:

Get the original file from HDFS to the local filesystem, modify it and then put it back on HDFS.

  1. hdfs dfs -get /user/hduser/myfile.txt

  2. vi myfile.txt #or use any other tool and modify it

  3. hdfs dfs -put -f myfile.txt /user/hduser/myfile.txt

like image 140
PradeepKumbhar Avatar answered Oct 07 '22 20:10

PradeepKumbhar


If you want to add lines, you must put another file and concatenate files:

hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile

To modify any portion of a file that is already written you have three options:

  1. Get file from hdfs and modify their content in local

    hdfs dfs -copyToLocal /hdfs/source/path /localfs/destination/path

    or

    hdfs dfs -cat /hdfs/source/path | modify...

  2. Use a processing technology to update as Map Reduce or Apache Spark, the result will appear as a directory of files and you will remove old files. It should be the best way.

  3. Install NFS or Fuse, both supports append operations.

    NFS Gateway

    Hadoop Fuse : mountableHDFS, helps allowing HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as ‘ls’, ‘cd’, ‘cp’, ‘mkdir’, ‘find’, ‘grep’

like image 43
MrElephant Avatar answered Oct 07 '22 20:10

MrElephant