Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getmerge command in hadoop datacopy

Tags:

hadoop

My aim is to read all the files that starts with "trans" in a directory and convert them into a single file and load that single file into HDFS location

my source directory is /user/cloudera/inputfiles/

Assume that inside that above directory , there are lot of file , but i need all the files that start with "trans"

my destination directory is /user/cloudera/transfiles/

So i tried this command below

hadoop dfs - getmerge /user/cloudera/inputfiles/trans* /user/cloudera/transfiles/records.txt

but the above command is not working .

If i try the below command then it works

hadoop dfs - getmerge /user/cloudera/inputfiles   /user/cloudera/transfiles/records.txt

Any suggestion on how do i merge some files from a hdfs location and store that merged single file in another hdfs location

like image 776
Surender Raja Avatar asked Feb 25 '15 09:02

Surender Raja


1 Answers

Below is the usage of getmerge command:

Usage: hdfs dfs -getmerge <src> <localdst> [addnl]

Takes a source directory and a destination file as input and 
concatenates files in src into the destination local file. 
Optionally addnl can be set to enable adding a newline character at the
end of each file.

It expects directory as first parameter.

you can try cat command like this:

hadoop dfs -cat /user/cloudera/inputfiles/trans* > /<local_fs_dir>/records.txt
hadoop dfs -copyFromLocal /<local_fs_dir>/records.txt /user/cloudera/transfiles/records.txt
like image 194
Ashish Avatar answered Oct 01 '22 22:10

Ashish