Is it possible to run hadoop fs -getmerge in S3?

Question

I have an Elastic Map Reduce job which is writing some files in S3 and I want to concatenate all the files to produce a unique text file.

Currently I'm manually copying the folder with all the files to our HDFS (hadoop fs copyFromLocal), then I'm running hadoop fs -getmerge and hadoop fs copyToLocal to obtain the file.

is there anyway to use hadoop fs directly on S3?

Brent Black · Accepted Answer

Actually, this response about getmerge is incorrect. getmerge expects a local destination and will not work with S3. It throws an IOException if you try and responds with -getmerge: Wrong FS:.

Usage:

hadoop fs [generic options] -getmerge [-nl] <src> <localdst>

justderb · Answer

An easy way (if you are generating a small file that fits on the master machine) is to do the following:

Merge the file parts into a single file onto the local machine (Documentation)
```
hadoop fs -getmerge hdfs://[FILE] [LOCAL FILE]
```
Copy the result file to S3, and then delete the local file (Documentation)
```
hadoop dfs -moveFromLocal [LOCAL FILE] s3n://bucket/key/of/file
```

Is it possible to run hadoop fs -getmerge in S3?

Tags:

amazon-s3

hadoop

amazon-emr

elastic-map-reduce

yeforriak

2 Answers

Brent Black

justderb

Recent Activity

Donate For Us

Is it possible to run hadoop fs -getmerge in S3?

Tags:

amazon-s3

hadoop

amazon-emr

elastic-map-reduce

yeforriak

2 Answers

Brent Black

justderb

Related questions

Recent Activity

Donate For Us